I am reading an html file one word at a time and inserting that word into a tree but I am trying to ignore the line of words if it starts with a tag or '<', then go to the next line. Anyone have a trick to this? Below is what I am working with.
input is the ifstream and word is the string
1 2 3 4 5 6 7 8 9 10
input >> word;
word = cin.get();
if(word == '<')
For an HTML file, you really should not assume that linebreak character means anything. Spaces can also be ignored in many cases, so do not rely on splitting words with those. Both below examples are valid html files, but the first one would be ignored completely, and the second - cut mid-sentence - not something you want probably.
<!DOCTYPE html><html><head></head><body>This Page is very poor <a href="/"> and hyperlinks</a>
are usualy inlined.</body></html>
I am searching 50 local HTML type files like this
experimental investigation of the aerodynamics of a
wing in a slipstream .
j. ae. scs. 25, 1958, 324.
an experimental study of a wing in a propeller slipstream was
made in order to determine the spanwise distribution of the lift
increase due to slipstream at different angles of attack of the wing
and at different free stream to slipstream velocity ratios . the
results were intended in part as an evaluation basis for different
theoretical treatments of this problem .
the comparative span loading curves, together with supporting
evidence, showed that a substantial part of the lift increment
produced by the slipstream was due to a /destalling/ or boundary-layer-control
effect . the integrated remaining lift increment,
after subtracting this destalling lift, was found to agree
well with a potential flow theory .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .