You can do two things. But if you're talking about million of lines, I would discart one of them:
1. You may download a dictionary text file, read it into a vector then search each word in this vector (but a set will have a better performance...). This would crash, but there's a chance of not happening.
2. (Discart it) Using network programming. Search for a site which tell you if the word exists. It will not increase the RAM usage as the first will, but the connection may be slow and not work.
3. Search for a library ;). Code::Blocks uses one of these.
Read your dictionary into a std::set. A standard English dictionary (words only) will easily fit into memory. Read your file of words to test sequentially. No reason to read it into memory in it's entireity. Test words by using set::find and write to a new file if found in your dictionary set.
One thing you did not indicate was if you needed to detect duplicate words in the file. That's harder if the word file is not in alphabetical order.
The list doesn't have any duplicates and is already in alphabetical order. I just need to get rid of unwanted gibberish and words with numbers/symbols in them. All I want is to see what words in the list are actual real, English words.
How do I make the lists compare with each other? Sorry if I sound really stupid; I'm pretty new to this.
I have my new list of English words I want it to be compared to.
I want it to get rid of any line that contains a word that isn't in an English dictionary.
You'll have to adjust the following which assumes that "line" is a single token and uses stringstreams rather than filestreams for my convenience, but it should give you the basic idea: