I've got two XML documents with the same structure. The format is pretty simple, it's something like this:
<Report>
<Findings>
<Finding>
<Message>Blah</Message>
<Item>Blah</Item>
<Severity>Blah</Severity>
</Finding>
<Finding>
<Message>Blah</Message>
<Item>Blah</Item>
<Severity>Blah</Severity>
</Finding>
</Findings>
</Report>
I've got two documents run a week apart, I'd like to be able to quickly discover which findings (the combination of Message and Item) are new from one week to the next. I've been able to do this within .Net by a couple of variations, with an inner loop and
and outer loop of the nodes in each document, comparing the values. The problem is each of these reports contains over 8,000 findings and no matter what process I've employed, the process takes a very long time to run.
If I've got both documents loaded in memory, is there some quick way I can compare the two using XPath or something and extract a node-set that existis only in one document and not the other?
"I'm a paladin with 18 charisma and 97 hit-points. I can use my Helm of Disintegration and do one d4 damage as my half mage elf wields his +5 Holy Avenger."
Hi,
There may be two approaches to handle this kind of situation.
Solution 1 : For Example, you have two XML Files with same structure. Now, you should use XPathNavigator class to navigate to XML file 1 as compare to XML file 2. start from 1st Finding, then compare first Node (Message here), if it is Not matched with previous
value, No need to compare forth. Just leave here, store its index somewhere (mark this Finding) and go to the next Finding.Same Process there too. At last, you have only those findigs(Nodes), which values has been changed. This process whould reduce the number
of Comparisons.
Solution 2 : you should create Log file(May be in File System or in DB or Some Where else), then make Comparion. Next time, you would have a short File to Compare as previous was already Filtered.
Sorry, If I MisUnderstood..
Correct Me, If I am Wrong!!!
Please "Mark As Answer;", if this Post helps you.
Visit My Blog
Simplest option would seem to be to get the app to generate a new log each week rather than including the previous weeks data? :)
I think I would probably approach this by creating a class and then loading the whole thing into memory and using linq to manipulate it.
Well, a little bit of context might help here. ;) The XML files are listings of FXCop code analysis errors uncovered in a particular codebase. What I'm trying to do is write a utility to uncover which errors were introduced in the past week. The app that
generates the XML files only knows what the errors look like at that time, it doesn't have a way that I can find to reference a previous version of the codebase to generate a delta.
That being said, I ended up going the LINQ route. It's significantly faster than any method I tried that involved looping through one or both of the files.
"I'm a paladin with 18 charisma and 97 hit-points. I can use my Helm of Disintegration and do one d4 damage as my half mage elf wields his +5 Holy Avenger."
mwisebaker
Member
30 Points
66 Posts
Use XPath To Identify Differences Between Two Documents?
Jan 04, 2012 01:13 PM|LINK
Hello,
I've got two XML documents with the same structure. The format is pretty simple, it's something like this:
<Report>
<Findings>
<Finding>
<Message>Blah</Message>
<Item>Blah</Item>
<Severity>Blah</Severity>
</Finding>
<Finding>
<Message>Blah</Message>
<Item>Blah</Item>
<Severity>Blah</Severity>
</Finding>
</Findings>
</Report>
I've got two documents run a week apart, I'd like to be able to quickly discover which findings (the combination of Message and Item) are new from one week to the next. I've been able to do this within .Net by a couple of variations, with an inner loop and and outer loop of the nodes in each document, comparing the values. The problem is each of these reports contains over 8,000 findings and no matter what process I've employed, the process takes a very long time to run.
If I've got both documents loaded in memory, is there some quick way I can compare the two using XPath or something and extract a node-set that existis only in one document and not the other?
rtpHarry
All-Star
56620 Points
8958 Posts
Re: Use XPath To Identify Differences Between Two Documents?
Jan 04, 2012 09:32 PM|LINK
Simplest option would seem to be to get the app to generate a new log each week rather than including the previous weeks data? :)
I think I would probably approach this by creating a class and then loading the whole thing into memory and using linq to manipulate it.
kuber.manral
Contributor
3051 Points
714 Posts
Re: Use XPath To Identify Differences Between Two Documents?
Jan 05, 2012 07:34 AM|LINK
Hi,
There may be two approaches to handle this kind of situation.
Solution 1 : For Example, you have two XML Files with same structure. Now, you should use XPathNavigator class to navigate to XML file 1 as compare to XML file 2. start from 1st Finding, then compare first Node (Message here), if it is Not matched with previous value, No need to compare forth. Just leave here, store its index somewhere (mark this Finding) and go to the next Finding.Same Process there too. At last, you have only those findigs(Nodes), which values has been changed. This process whould reduce the number of Comparisons.
Solution 2 : you should create Log file(May be in File System or in DB or Some Where else), then make Comparion. Next time, you would have a short File to Compare as previous was already Filtered.
Sorry, If I MisUnderstood..
Correct Me, If I am Wrong!!!
Visit My Blog
mwisebaker
Member
30 Points
66 Posts
Re: Use XPath To Identify Differences Between Two Documents?
Jan 05, 2012 12:32 PM|LINK
Well, a little bit of context might help here. ;) The XML files are listings of FXCop code analysis errors uncovered in a particular codebase. What I'm trying to do is write a utility to uncover which errors were introduced in the past week. The app that generates the XML files only knows what the errors look like at that time, it doesn't have a way that I can find to reference a previous version of the codebase to generate a delta.
That being said, I ended up going the LINQ route. It's significantly faster than any method I tried that involved looping through one or both of the files.
Decker Dong ...
All-Star
118619 Points
18779 Posts
Re: Use XPath To Identify Differences Between Two Documents?
Jan 06, 2012 12:35 AM|LINK
Congratulations mwisebaker——Hope you can come here more to share us solutions or feedback freely!