Display English words (CurlPP)

Hello

I first asked this question here:
http://www.cplusplus.com/forum/beginner/141021/
But thought it was worth to make another thread for it.

Someone suggested Markov Chains, but I don't really know how I could implement that in C++.

Anyway, I've got a word generator, but it does only display random characters, I want to make it display real English words.

If no one could explain Markov Chains/other methods it's okay.
But another method I'm thinking of is by comparing it by other websites.

Example:
-if string.generated_word exists on http://wikipedia.com string.generated_work is true.

-Or by just displaying the content of divisions (<div></div>).
Is this possible by using CurlPP? How?

Any method (and easy is best) to achieve this would be awesome! :)

Thanks for reading,
Niely
Last edited on
Are you writing your program on Linux or windows? If Linux, there is a dictionary file you can reference that contains all of the words in the English language. I'm not sure if windows has an equivalent file, but I suspect it doesn't. You could generate a word, then check if it's in the list before displaying it. Or just generate a random number and use the word at that index in the file.
Linux. :)
But the problem is, I don't want to use a dictionary file.

I just want to check if a specific word exists on a website.
I guess I don't understand what you're actually trying to do here... What is your end goal? To randomly generate real English words? Or is the goal to parse web pages looking for specific words? Those two tasks are very very different. You wouldn't want to parse a webpage to verify that a word is real. Also, it would make no sense to parse a webpage looking for a word that you have generated randomly. The word (which may or may not even be real) may or may not exist on the page. I'm not sure what information could be gathered from that exercise. If, on the other hand, you have a predefined list of specific words you want to search for, that's another story.
Check if a word exists on a website.

Like this Pseudo code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
random = rand() % 3;
if (random == 1) {
word = word + "a";
} else if (random == 2) {
word = word + "b";
} else if (random == 3) {
word = word + "c";
} else {
//error
}
cout << "Final word: " << word<<endl;
if (word exists on http://wikipedia.com) {
cout << word << result.txt;
}


Of course I need for loops and fstreams but I already have those, this is just a pseudo code to show what I want. :)
OK... I think I'm still at a loss as to what you are trying to do here. I assume this has changed from trying to generate real words, to an experiment in using some web libraries? Checking if randomly generated strings of characters exist on a webpage doesn't seem to have any informational value, so I can only assume you are only doing that to learn how to query web pages from a c++ program? If that is the case, I would recommend a language that would be better suited for that task. Python, Perl, or even Java might be a better choice for web parsing. Your original request appeared to be trying to randomly generate real words, and if that is still your ultimate goal, reading a webpage to verify your words is the wrong way to do it.
How do you recommend to do it then?
Okay, what if I want to take words of a website and display them in the terminal?

Wikipedia.org:
The French revolution of 1789 - 1795 was a real...


C++ Terminal:
The
French
revolution
of
1789
-
1795
was
a
real
...


Is that possible then? :)
Again, I would recommend using a different language that is better suited for this type of task (Python, Perl, Java...). Reading/parsing a webpage is not a trivial task with c++, but this post might give you some ideas on how to do it:
http://www.cplusplus.com/forum/windows/36638/

As for eliminating all of the HTML, scripts, and styles so you just return the text, you'll have a bit of work to parse the stream of data coming back.

Here is an example of how to do this with a language more suited for this type of task:
http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html
It looks like your best bet is to use curl to get the html, and and xml parser to get the content of the body tag. Once you have the content, then all you are left with is a string that you have to split into words.
^Yes, but how exactly would that be done in C++?
Look in the thread I posted previously. There is an example in there using the curl libraries.

http://www.cplusplus.com/forum/windows/36638/
I've been as precise as I can without figuring out how to do it myself. Read some documentation.
Topic archived. No new replies allowed.