Last post Dec 01, 2010 09:43 AM by Martin_Honnen
Dec 01, 2010 05:31 AM|m-burgess|LINK
I have come accross an issue when loading xml files into vb.net. It appears that the xml file contains special characters from Microsoft Office, the main culprit being
This is the code I am using to load the XML File(s). And this is where the exception is triggered.
Dim xmlDoc As XmlDocument = New XmlDocument()
xmlDoc.Load("\\brevf03\Incoming\Documents\" + fileName + ".xml")
Is there anyway these special characters can be handled?
Any help is greatly appreciated.
The exception is below:
System.Xml.XmlException: '', hexadecimal value 0x12, is an invalid character. Line 6840, position 37. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String args) at System.Xml.XmlTextReaderImpl.Throw(Int32
pos, String res, String args) at System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar) at System.Xml.XmlTextReaderImpl.ParseCDataOrComment(XmlNodeType type, Int32& outStartPos, Int32& outEndPos) at System.Xml.XmlTextReaderImpl.ParseCDataOrComment(XmlNodeType
type) at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace) at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc) at System.Xml.XmlLoader.Load(XmlDocument
doc, XmlReader reader, Boolean preserveWhitespace) at System.Xml.XmlDocument.Load(XmlReader reader) at System.Xml.XmlDocument.Load(String filename) at _Default.Page_Load(Object sender, EventArgs e) in D:\Websites\WebSites\CaseHistory\Default.aspx.vb:line 69
Dec 01, 2010 08:22 AM|Martin_Honnen|LINK
That character is a control character that is not allowed in XML 1.0. The only allowed ones are the characters with Unicode code point 9, 10, 13.
Dec 01, 2010 09:15 AM|m-burgess|LINK
thank You for the reply.
Is my only option to remove the characters from the xml first? Not sure how i could do that.
Dec 01, 2010 09:43 AM|Martin_Honnen|LINK
The main point of a standardized format like XML is to be able to use any XML parser on any platform to process an XML document. What you have is technically not a (well-formed) XML document so the XML parser correctly rejects it. The right approach is to
inform the creator or author of that document that he sent a mal-formed document and that he should fix the process creating the document. If that is not an option then you can try to correct any errors on your side but don't expect the XML tools/APIs to be
any help with that. You would need to use something like
File.ReadAllText and then use string processing, probably with the help of regular expressions, to remove anything that is not allowed.