1) Open files. 2) Iterate them both at the same time. 3) If the char is the same, add to a are_similar counter. 4) If the char is different, add to a are_different counter. 5)similarity_percentage = are_similar * 100 / (are_similar + are_different)
And here's some sample code to get you started. It wasn't tested.
Do you expect files to be identical and just want to count how many characters differ between them? Then go for something like Catfish3's suggestion.
If not, you need a far more sophisticated approach as ne555 suggested.
1) You will need a similarity measure and score for characters. In your case this may be easy, e.g.,: give each pair of equivalent characters in the two files a score of +1 if both chars are equal, 0 if they differ. Depending on the task, this similarity measure may be too easy though (Is an "E" as different from an "e" as from a "Y"?).
2) Because of possible insertions and deletions (the example ne555 gave contains both, or 1 "move"), the problem gets far more complicated because it is not at all obvious which characters in the two files are equivalent.
You may want to look at the UNIX diff command and the algorithms used in stuff like this. Many version control systems like GIT need to perform this task a lot.
I suggest you look up "Edit distance" on google.
In Bioinformatics, a special version of your problem (where the 2 files being compared are actually DNA strands) is usually solved by dynamic programming, more specifically, the Needleman-Wunsch Algorithm.
Finally, in the case of XML, you may be better of using an XML parser first (unless you expect one of the files to be broken XML du to changes). I suggest you describe in more detail why you want to compare the files.
Okay. Well, I need a program just like plagiarism but only with two files (can be either excel or any text documents). Here I need to compare two files and display their similarity percentage. Suppose the first word of file1 contains the word "Is" and the file2 contains "is" it should display it as same and calculate the total number of similar words and display PERCENTAGE.
I would appreciate any help :)