I've got a question regarding teh XmlTextReader object. Im creating a parser object that is supposed to be able to deal with invalid XML or invalid XML Elements. So everything is working fine, EXCEPT the XmlTextReader.
P.S. I'm dealing with 3rd-party xml files so I have no grip on the what they submit. And I can't use the regular XDocument cause this object going to throw an exception when it runs into the problem.
This is the case:
I managed to get my custom parser working, even if it runs into a problem. So what i do right now is that I complete the current parsed elements and then return the so far parsed elements.
However, when the XmlTextReader goes into the error state I'm simply not able to get it out of that state no matter what. This is exactly my problem, cause I know where I should continue, but I can't continue cause the XmlTextReader stays in
the error state.
I tried using the skip and readto... methods to skip this line of XML messing up my parser.
So if anyone has the solution for this problem, you would do me a great favour. If not I will have to write my own XML Text Reader.
So to clearify my problem, here is my code:
public class Parser
{
public delegate void OnParseError (Exception exception, XElement element);
public delegate void OnParseComplete (XElement result);
public static void Parse(string xml, OnParseError onError, OnParseComplete onComplete)
{
List<XElement> elements = new List<XElement>();
Stack<XElement> elemstack = new Stack<XElement>();
XmlTextReader reader = new XmlTextReader(new MemoryStream(Encoding.ASCII.GetBytes(xml)));
bool done = false;
while (!done)
{
try
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Attribute:
if (elemstack.Count > 0)
elemstack.Peek().SetAttributeValue(XName.Get(reader.Name, reader.NamespaceURI), reader.Value);
break;
case XmlNodeType.Element:
XElement startElement = new XElement(XName.Get(reader.Name, reader.NamespaceURI));
elemstack.Push(startElement);
break;
case XmlNodeType.EndElement:
if(reader.Name != elemstack.Peek().Name)
reader.Skip();
else
{
var endElement = elemstack.Pop();
if (elemstack.Count > 0)
elemstack.Peek().Add(endElement);
elements.Add(endElement);
}
break;
case XmlNodeType.Text:
elemstack.Peek().Value = reader.Value;
break;
case XmlNodeType.CDATA:
case XmlNodeType.Comment:
case XmlNodeType.Document:
case XmlNodeType.DocumentFragment:
case XmlNodeType.DocumentType:
case XmlNodeType.EndEntity:
case XmlNodeType.Entity:
case XmlNodeType.EntityReference:
case XmlNodeType.None:
case XmlNodeType.Notation:
case XmlNodeType.ProcessingInstruction:
case XmlNodeType.SignificantWhitespace:
case XmlNodeType.Whitespace:
case XmlNodeType.XmlDeclaration:
default:
break;
}
}
done = true;
onComplete(elements.Last());
}
catch (XmlException ex)
{
if (elemstack.Count > 0)
{
var errElem = elemstack.Pop();
onError(ex, errElem);
while (elemstack.Count > 0)
{
var endElement = elemstack.Pop();
if (elemstack.Count > 0) { elemstack.Peek().Add(endElement); }
elements.Add(endElement);
}
onComplete(elements.Last());
}
else
onError(ex, null);
}
}
}
}
Matthijs Koopman
Please mark my reply as answer if you found it help full
. Im creating a parser object that is supposed to be able to deal with invalid XML or invalid XML Elements. So everything is working fine, EXCEPT the XmlTextReader.
P.S. I'm dealing with 3rd-party xml files so I have no grip on the what they submit.
If are not VALID XML, then can not be parsed.Period.
That said, the only reccomandation is make your parser to verify if it is Valid XML. If not, parse best or throw error ( maybe a human will parse faster to find error?)
If want automatically, look at http://htmlagilitypack.codeplex.com/ to see how it parses invalid( from an XML point of view) HTML and makes an XML.
My parser will try to parse as many as posible, e.g. whenthe parser fails in parsing an RSS feed in the third item, it should skip that item and continue with the others.
The only thing i need to know is if it is possible to resume the XMLTextReader after it went into an error state. If not I'll write my own.
Matthijs Koopman
Please mark my reply as answer if you found it help full
m.koopman
Participant
1372 Points
294 Posts
XML Parser handling invalid XML
Jan 04, 2012 02:20 PM|LINK
Hey,
I've got a question regarding teh XmlTextReader object. Im creating a parser object that is supposed to be able to deal with invalid XML or invalid XML Elements. So everything is working fine, EXCEPT the XmlTextReader.
P.S. I'm dealing with 3rd-party xml files so I have no grip on the what they submit. And I can't use the regular XDocument cause this object going to throw an exception when it runs into the problem.
This is the case:
I managed to get my custom parser working, even if it runs into a problem. So what i do right now is that I complete the current parsed elements and then return the so far parsed elements.
However, when the XmlTextReader goes into the error state I'm simply not able to get it out of that state no matter what. This is exactly my problem, cause I know where I should continue, but I can't continue cause the XmlTextReader stays in the error state.
I tried using the skip and readto... methods to skip this line of XML messing up my parser.
So if anyone has the solution for this problem, you would do me a great favour. If not I will have to write my own XML Text Reader.
So to clearify my problem, here is my code:
public class Parser { public delegate void OnParseError (Exception exception, XElement element); public delegate void OnParseComplete (XElement result); public static void Parse(string xml, OnParseError onError, OnParseComplete onComplete) { List<XElement> elements = new List<XElement>(); Stack<XElement> elemstack = new Stack<XElement>(); XmlTextReader reader = new XmlTextReader(new MemoryStream(Encoding.ASCII.GetBytes(xml))); bool done = false; while (!done) { try { while (reader.Read()) { switch (reader.NodeType) { case XmlNodeType.Attribute: if (elemstack.Count > 0) elemstack.Peek().SetAttributeValue(XName.Get(reader.Name, reader.NamespaceURI), reader.Value); break; case XmlNodeType.Element: XElement startElement = new XElement(XName.Get(reader.Name, reader.NamespaceURI)); elemstack.Push(startElement); break; case XmlNodeType.EndElement: if(reader.Name != elemstack.Peek().Name) reader.Skip(); else { var endElement = elemstack.Pop(); if (elemstack.Count > 0) elemstack.Peek().Add(endElement); elements.Add(endElement); } break; case XmlNodeType.Text: elemstack.Peek().Value = reader.Value; break; case XmlNodeType.CDATA: case XmlNodeType.Comment: case XmlNodeType.Document: case XmlNodeType.DocumentFragment: case XmlNodeType.DocumentType: case XmlNodeType.EndEntity: case XmlNodeType.Entity: case XmlNodeType.EntityReference: case XmlNodeType.None: case XmlNodeType.Notation: case XmlNodeType.ProcessingInstruction: case XmlNodeType.SignificantWhitespace: case XmlNodeType.Whitespace: case XmlNodeType.XmlDeclaration: default: break; } } done = true; onComplete(elements.Last()); } catch (XmlException ex) { if (elemstack.Count > 0) { var errElem = elemstack.Pop(); onError(ex, errElem); while (elemstack.Count > 0) { var endElement = elemstack.Pop(); if (elemstack.Count > 0) { elemstack.Peek().Add(endElement); } elements.Add(endElement); } onComplete(elements.Last()); } else onError(ex, null); } } } }Please mark my reply as answer if you found it help full
ignatandrei
All-Star
134575 Points
21588 Posts
Moderator
MVP
Re: XML Parser handling invalid XML
Jan 04, 2012 09:42 PM|LINK
If are not VALID XML, then can not be parsed.Period.
How do you parse
<person firstName=<AndreiLastName=Ignat ></lastName></Person>
?
That said, the only reccomandation is make your parser to verify if it is Valid XML. If not, parse best or throw error ( maybe a human will parse faster to find error?)
If want automatically, look at http://htmlagilitypack.codeplex.com/ to see how it parses invalid( from an XML point of view) HTML and makes an XML.
m.koopman
Participant
1372 Points
294 Posts
Re: XML Parser handling invalid XML
Jan 05, 2012 07:11 AM|LINK
My parser will try to parse as many as posible, e.g. whenthe parser fails in parsing an RSS feed in the third item, it should skip that item and continue with the others.
The only thing i need to know is if it is possible to resume the XMLTextReader after it went into an error state. If not I'll write my own.
Please mark my reply as answer if you found it help full
ignatandrei
All-Star
134575 Points
21588 Posts
Moderator
MVP
Re: XML Parser handling invalid XML
Jan 05, 2012 08:04 AM|LINK
As I know, no.
m.koopman
Participant
1372 Points
294 Posts
Re: XML Parser handling invalid XML
Jan 05, 2012 06:05 PM|LINK
Alright, thanks for the information. Will write my own XmlReader in that case...
Please mark my reply as answer if you found it help full