finding differences between two htm files

I have two htm files including the components and 9 parameters of components and there are lots of components in the lists. When i try to change to text file it happens like below

<html>
<head><title>Component Report</title></head>
<b><u>Design Name</u>&nbsp D:/Work/14MB58/14MB58/14MB58/BRD/14MB58.brd</b><br>
<b><u>Date</u>&nbsp Tue Jun 23 13:39:30 2015</b><br>
<b><u>Total Components: </b></u>&nbsp885<br><br><BODY>

<center><table border><caption><b><u>Component Report</u></b></caption>
<tr>
<td><b>REFDES</b></td>
<td><b>COMP_DEVICE_TYPE</b></td>
<td><b>COMP_VALUE</b></td>
<td><b>COMP_TOL</b></td>
<td><b>COMP_PACKAGE</b></td>
<td><b>SYM_X</b></td>
<td><b>SYM_Y</b></td>
<td><b>SYM_ROTATE</b></td>
<td><b>SYM_MIRROR</b></td>
</tr>
<tr>
<td nowrap>C1</td>
<td nowrap>CAP SMD 1UF 6_0.3V K X5R 0402_S</td>
<td nowrap>1uF</td>
<td nowrap>10%</td>
<td nowrap>SMC0402</td>
<td nowrap align=right>32.0799</td>
<td nowrap align=right>14.8557</td>
<td nowrap align=right>270.000</td>
<td nowrap>NO</td>
</tr>
<tr>
<td nowrap>C2</td>
<td nowrap>CAP SMD 0.1UF 6.3V K X5R 0201_S</td>
<td nowrap>0.1uF</td>
<td nowrap>10%</td>
<td nowrap>SMC0201</td>
<td nowrap align=right>37.4558</td>
<td nowrap align=right>6.4333</td>
.
.
.
.
.
and it goes on like 100 pages and there are two of them. I am planning to have each component into struct and create a vector for each strcut, then i will take first feature of first component and search for same property in second list. Then i will continue like that, if it can't find element in second list i will write that component to output.

am i thinking right, or is there any basic algorithm or htm analysis something like that?

my second question is when i made this program can it be used without visual studio? i never tried this, i will use console c++ application.

i am open to any suggestion, i have 30 days to finish program for my internship, i will be asking coding questions too, thanks in advance :)

edit: it doesn't even directly read .htm file is there any way to open it :/
Last edited on
A html file is just a text file with a fancy name so that a web browser knows it's a web page when displaying the html code. Yes you can read it just like a txt file.

Idea 1.

Since your subject says "finding differences between two htm files". I'm assuming most of the 2 files are very much alike.

I'm not sure if a line by line comparison is appropriate if line 3 on one file is on line 4 in the other file.

If that is not the case, then I'd first compare file 1 line 1 to file 2 line 1
If they are the same, move on

Once that is working, then you can break down the lines that do not match to see what is different.
I think i'd do it char by char.


Idea 2
Knowing that 2 different html editing programs are going to save things completely different, even if it's just plain text, then your in over my head, but I think the right thing to do will be to find the similarities and then consider the rest.
Last edited on
Topic archived. No new replies allowed.