Word Break

Given a string s and a dictionary of words dict, add spaces in s to construct a sentence where each word is a valid dictionary word.

Return all such possible sentences.

For example, given
s = "snakesandladder",
dict = ["snake", "snakes", "and", "sand", "ladder"].

A solution is ["snakes and ladder",
"snake sand ladder"].

Input:
The first line contains an integer T, denoting the number of test cases.
Every test case has 3 lines.
The first line contains an integer N, number of words in the dictionary.
The second line contains N strings denoting the words of the dictionary.
The third line contains a string s.

Output:
For each test case, print all possible strings in one line. Each string is enclosed in (). See Example.
If no string possible print "Empty" (without quotes).

Constraints:
1<=T<=100
1<=N<=20
1<=Length of each word in dictionary <=15
1<=Length of s<=1000

No idea how to start this question even took help from gfg but still not clear, please explain in the easiest manner to solve this type of question without using tries.
Last edited on
Here is an algorithm that would work, I think. I haven't tested it; it's just the obvious algorithm that jumps out. I could well be missing something in it.

Define a "candidate" word to be a group of consecutive letters, of size 1 letter upwards.

First candidate is just the first letter.

For each candidate:
- if it's a word, insert a space and the next candidate is the next letter.
- if it's not a word, the next candidate is this candidate plus the next letter.
Move on to next candidate

Eventually, you will reach the end of the letters (you found a solution), or your final candidate will not be a word. At this point, backtrack to the last point in your calculations where there is a branch not yet taken.

Eventually, your final candidate will be the entire string, and then you're done.


Example: snakesandladder

S - not a word
SN - not a word
SNA - not a word
SNAK - not a word
SNAKE - a word found "SNAKE"
S - not a word
SA - not a word
SAN - not a word
SAND - a word found "SNAKE SAND"
L
LA - a word found "SNAKE SAND LA"
D
DD
DDE
DDER - reached end of strong, no solution found, go back and reject last word in your solution string.

Solution string is now: "SNAKE SAND"
LA - now rejected
LAD - a word found "SNAKE SAND LAD"
D
DE
DER - reached end of strong, no solution found, go back

Solution string is now: "SNAKE SAND"
LAD - now rejected
LADD
LADDE
LADDER - a word found "SNAKE SAND LADDER", whole string used, one solution found, go back TWO successful candidates (i.e. pretend SAND is not a word)


SAND - now rejected, solution string is now "SNAKE"
SANDL
SANDLA
SANDLAD
SANDLADD
SANDLADDE
SANDLADDER - reached end of strong, no solution found, go back and reject last word in solution string

Solutions string is now: " "
SNAKE - rejected
SNAKES - - a word found "SNAKES"
A - a word found, "SNAKES A"

and it goes on...

Basically, you still have a trie, but you aren't building it. You're traversing it. The only thing left for you to do is work out how you will know how many words back to go when you have to reverse course.


Last edited on
Hello oyehoye6906,

Correct me if I am wrong, but I get the impression that the assignment is to work with the member functions of the string class.

I also get the impression that there is an input file that yo will be reading from. If so please post the input file of a fiar sample so that everyone will be working with the same information.

I would start with opening the input file and reading it into variables so you have something to work with. Then use a "cout" to print what you have read for now. Later you can delete the "cout" statements.

As for the string functions you can start here to see what you have to work with http://www.cplusplus.com/reference/string/string/

Based on the opening of your post one of the functions will be "insert".

Hope that helps,

Andy
An alternative method is to identify all possible ways of breaking up the string, and then filtering out all the invalid ones.

For example, the input string "abc" can be split up as:

abc (one word)
a bc (two words)
ab c (two words)
a b c (three words)

Then, each of these generated candidate answers is checked; each individual word in them checked to see if it's a word from the dictionary. If any word isn't in the dictionary, that candidate answer is rejected.
Last edited on
Simple recursive solution:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <string>
#include <vector>
#include <cstring>

using namespace std;

void findSentences(vector<string> &dict, const char *s,
		  const string &sentenceSoFar)
{
    if (*s == 0) {
	// base case. You found a sentence!
	cout << sentenceSoFar << '\n';
    } else {
	// recursive case
	for (auto &word : dict) {
	    if (strncmp(word.c_str(), s, word.size()) == 0) {
		string newSentence{sentenceSoFar};
		if (newSentence.size()) {
		    newSentence += ' ';
		}
		newSentence += word;
		findSentences(dict, s+word.size(), newSentence);
	    }
	}
    }
}

int main()
{
    vector<string> dict {"snake", "snakes", "and", "sand", "ladder"};
    findSentences(dict, "snakesandladder", "");
}


There are several inefficiencies in this. For a large dictionary, you should sort the dictionary and do a binary search to find the matching words. It would also be faster to avoid creating a new sentence for each recursive call and instead just add the new word to sentence, pass sentence by reference to the recursive call, and then remove the word. Finally, you could always append the space when doing recursive calls and print beyond it when you find a sentence.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
void findSentences(vector<string> &dict, const char *s,
		  string &sentenceSoFar)
{
...
	    if (strncmp(word.c_str(), s, word.size()) == 0) {
		unsigned oldLen = sentenceSoFar.size();
		sentenceSoFar+= ' ';
		sentenceSoFar += word;
		findSentences(dict, s+word.size(), sentenceSoFar);
		sentenceSoFar.erase(oldLen);
	    }
	}
    }
}

int main()
{
    vector<string> dict {"snake", "snakes", "and", "sand", "ladder"};
    string sentence;
    findSentences(dict, "snakesandladder", sentence);
}


There are probably other improvements. The biggest one for a practical application where the dictionary is large would be the binary search.
@dhayden, I can use hash map to search words in almost constant time. So large dictionary won't be a problem.

@repeater, Yeah that's the brute force solution anyone can think of. But we need a time-efficient solution for this. :) Though this is what I was thinking of to store True or false in an array for every range in the string while doing this to make it efficient.

> Yeah that's the brute force solution anyone can think of
I thought you said «No idea how to start this question»

> But we need a time-efficient solution for this.
¿what order do you need?
¿what's the time complexity of the brute-force solution? ¿what kind of cases are bad for it?


also, Repeater leaved out the printing of each solution
try not being so brute there

> store True or false in an array for every range in the string
¿so what does it mean that the range [4;8] = true?
Last edited on
I can use hash map to search words in almost constant time. So large dictionary won't be a problem.
You may be able to search in constant time, but now many searches will you make? Starting at the beginning of string s, you search for the 1-character word formed by the beginning, then the 2 character word then 3, then 4... when do you stop? A naive approach will search all the way up to the entire string length. A better solution is up to the doctionary's maximum word size. But either way, it's possible that a binary search would be faster because that way you know exactly when to stop looking.
how big a dictionary are you going to use?
if it remains relatively small, I would probably search each word in the dictionary in the string, eg is 'sand' in your input (repeatedly if need duplicates) rather than check buildup strings from the input.
you could also make a lookup table of words in words (sand contains a,an, and) so that if you find one you find them all at once, and then look for the smallest words and work up through the list after finding one. (requires you to preprocess your dictionary once).

this may be a problem where best algorithm depends on some criteria for the problem(s) you will be attacking.
Last edited on
I thought of another improvement for when the string is large. The idea is to remember all the words that start at a particular position. If you find yourself looking at that position again, you already have your answer.
dhayden + 1

Memoization is important in this kind of problem.
After that you only need to filter out invalid possibilities with the recursive approach.


HOWEVER, I something smells like OP is not in a 300+ level course... even though “time efficient” does suggest this is an actual Algorithms course...? Maybe OP is just waaaay over his head?

The fact that the dictionary is supplied is a HUGE efficiency gain, though.
Last edited on
Topic archived. No new replies allowed.