i am attempting to process an XML file i receive from another party, however there are numerous instances whereby elements include blocks of white space, and in places this is breaking certain types of fields i am converting such as prices and dates.
these files can be large so im not sure how feasible it would be to do a replace on the whole file, besides for instances where the datetime field is affected, "2016-08-23 00:0 0:00" i have to identify the time part to do a replace on else it removes the
space between date and time and my convert will fail on that.
can i do anything to "clean" the whole file or am i limited to treating each affected type of value one by one in the code?
I don't think it's feasible to do a global replace on a large file, especially if you have edge cases like that datetime scenario.
I would use an
XmlReader to parse large XML documents in combination with
XmlWriter for the replacement of whitespace. XmlReader/XmlWriter are sequential access streams and provide fast, forward-only, non-cached access to XML data. Basically, you have to read in on one end, process the stream how you want,
and write it out the other end. The advantage is that you don't need to read the whole thing into memory and build a DOM. The combination of the two, should yield good performance.
i am attempting to process an XML file i receive from another party, however there are numerous instances whereby elements include blocks of white space, and in places this is breaking certain types of fields i am converting such as prices and dates.
You question is a bit vague. Is there any reason why you can't simply trim the white space when parsing the XML file?
mark1961
these files can be large so im not sure how feasible it would be to do a replace on the whole file, besides for instances where the datetime field is affected, "2016-08-23 00:0 0:00" i have to identify the time part to do a replace on else it removes the space
between date and time and my convert will fail on that.
Why? the date is fine. I'm guessing you're doing a string.Replace? If so, don't... just trim the field.
mark1961
can i do anything to "clean" the whole file or am i limited to treating each affected type of value one by one in the code?
I can't see the file but I've had to clean up files. I usually load the contents into a temp container like a table where all the fields are strings then do a second parsing to get the types.
Have you tried asking the sender to stop putting white space in the fields?
None
0 Points
64 Posts
XML file - removal of spacing
May 12, 2017 01:30 PM|mark1961|LINK
i am attempting to process an XML file i receive from another party, however there are numerous instances whereby elements include blocks of white space, and in places this is breaking certain types of fields i am converting such as prices and dates.
these files can be large so im not sure how feasible it would be to do a replace on the whole file, besides for instances where the datetime field is affected, "2016-08-23 00:0 0:00" i have to identify the time part to do a replace on else it removes the space between date and time and my convert will fail on that.
can i do anything to "clean" the whole file or am i limited to treating each affected type of value one by one in the code?
Participant
1310 Points
442 Posts
Re: XML file - removal of spacing
May 12, 2017 04:06 PM|deepalgorithm|LINK
I don't think it's feasible to do a global replace on a large file, especially if you have edge cases like that datetime scenario.
I would use an XmlReader to parse large XML documents in combination with XmlWriter for the replacement of whitespace. XmlReader/XmlWriter are sequential access streams and provide fast, forward-only, non-cached access to XML data. Basically, you have to read in on one end, process the stream how you want, and write it out the other end. The advantage is that you don't need to read the whole thing into memory and build a DOM. The combination of the two, should yield good performance.
All-Star
52091 Points
23222 Posts
Re: XML file - removal of spacing
May 12, 2017 04:08 PM|mgebhard|LINK
You question is a bit vague. Is there any reason why you can't simply trim the white space when parsing the XML file?
Why? the date is fine. I'm guessing you're doing a string.Replace? If so, don't... just trim the field.
I can't see the file but I've had to clean up files. I usually load the contents into a temp container like a table where all the fields are strings then do a second parsing to get the types.
Have you tried asking the sender to stop putting white space in the fields?