The other day I came across the following exception:
"Response is not well-formed XML System.Xml.XmlException: '', hexadecimal value 0x13, is an invalid character."
This obviously occurred from an illegal character in something I was sending to a web service. I found that the best way for my application to prevent this was to remove any such characters long before I send the data to the
web service.
...Then I found some .net code links claiming code for this issue, but I didn't really like the implementation. So here is my question: Does anyone have a decent method they could post that takes an input 'String' and checks
to remove any illegal XML characters?
Here's a C# conversion of the Java code you posted a link to (you'll need to add "using System.Text;" at the top of your C# file for the StringBuilder class):
public String stripNonValidXMLCharacters(string textIn)
{
StringBuilder textOut = new StringBuilder(); // Used to hold the output.
char current; // Used to reference the current character.
if (textIn == null || textIn == string.Empty) return string.Empty; // vacancy test.
for (int i = 0; i < textIn.Length; i++) {
current = textIn[i];
if ((current == 0x9 || current == 0xA || current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
{
textOut.Append(current);
}
}
return textOut.ToString();
}
if ((current == 0x9 || current == 0xA || current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
I am having difficulty getting the above line converted to VB.NET. See C# and VB.NET do character to integer conversion a little differently. What I end up with is the following design time error:
" Operator '=' is not defined for types 'Char' and 'Integer'"
The code by the way in VB.NET for that line is as follows:
If (current = &H9 OrElse current = &HA OrElse current = &HD) OrElse ((current >= &H20) AndAlso (current <= &HD7FF)) OrElse ((current >= &HE000) AndAlso (current <= &HFFFD)) OrElse ((current >= &H10000) AndAlso (current <= &H10FFFF)) Then
textOut.Append(current)
End If
Now I have been trying some combinations of getting the Ascii value via 'Asc()' or converting values to thier hex value via conversion functions, but to no immdeate avail.
Any ideas on how to make proper comparisons on the code above?
I think you need to use AscW to convert the character to an integer for the comparison and then ChrW to convert back to a character that you append:
Public Function stripNonValidXMLCharacters(ByVal textIn As String) As [String]
Dim textOut As New StringBuilder()
' Used to hold the output.
Dim current As Integer
' Used to reference the current character.
If textIn Is Nothing OrElse textIn = String.Empty Then
Return String.Empty
End If
' vacancy test.
For i As Integer = 0 To textIn.Length - 1
current = AscW(textIn(i))
If (current = &H9 OrElse current = &HA OrElse current = &HD) OrElse ((current >= &H20) AndAlso (current <= &HD7FF)) OrElse ((current >= &HE000) AndAlso (current <= &HFFFD)) OrElse ((current >= &H10000) AndAlso (current <= &H10FFFF)) Then
textOut.Append(ChrW(current))
End If
Next
Return textOut.ToString()
End Function
Untested!
Martin Honnen --- MVP Data Platform Development
My blog
Yes the last (2) posts were very helpful - many thanks to Brent and Martin. Brent especially, thank you for converting that Java to C#, and Martin, the 'AscW' function was needed as shown below.
Here is the VB.NET working version of removing illegal XML characters from a String:
Public Shared Function RemoveIllegalXMLCharacters(ByVal Content As String) As String
'Used to hold the output.
Dim textOut As New StringBuilder()
'Used to reference the current character.
Dim current As Char
'Exit out and return an empty string if nothing was passed in to method
If Content Is Nothing OrElse Content = String.Empty Then
Return String.Empty
End If
'Loop through the lenght of the content (1) character at a time to see if there
'are any illegal characters to be removed:
For i As Integer = 0 To Content.Length - 1
'Reference the current character
current = Content(i)
'Only append back to the StringBuilder valid non-illegal characters
If (AscW(current) = &H9 OrElse AscW(current) = &HA OrElse AscW(current) = &HD) _
OrElse ((AscW(current) >= &H20) AndAlso (AscW(current) <= &HD7FF)) _
OrElse ((AscW(current) >= &HE000) AndAlso (AscW(current) <= &HFFFD)) _
OrElse ((AscW(current) >= &H10000) AndAlso (AscW(current) <= &H10FFFF)) Then
textOut.Append(current)
End If
Next
'Return the screened content with only valid characters
Return textOut.ToString()
End Function
Star
12060 Points
2740 Posts
Need a method that removes illegal XML characters from a String
Oct 21, 2009 09:28 AM|atconway|LINK
The other day I came across the following exception:
"Response is not well-formed XML System.Xml.XmlException: '', hexadecimal value 0x13, is an invalid character."
This obviously occurred from an illegal character in something I was sending to a web service. I found that the best way for my application to prevent this was to remove any such characters long before I send the data to the web service.
I found the following link, which I seemed like the solution....the only problem is the code is in Java and I can not get it converted.(http://benjchristensen.com/2008/02/07/how-to-strip-invalid-xml-characters/)
...Then I found some .net code links claiming code for this issue, but I didn't really like the implementation. So here is my question: Does anyone have a decent method they could post that takes an input 'String' and checks to remove any illegal XML characters?
Thank you!
Member
600 Points
177 Posts
Re: Need a method that removes illegal XML characters from a String
Oct 21, 2009 11:22 AM|uid117455|LINK
Here's a C# conversion of the Java code you posted a link to (you'll need to add "using System.Text;" at the top of your C# file for the StringBuilder class):
Star
12060 Points
2740 Posts
Re: Need a method that removes illegal XML characters from a String
Oct 21, 2009 12:26 PM|atconway|LINK
I am having difficulty getting the above line converted to VB.NET. See C# and VB.NET do character to integer conversion a little differently. What I end up with is the following design time error:
" Operator '=' is not defined for types 'Char' and 'Integer'"
The code by the way in VB.NET for that line is as follows:
Now I have been trying some combinations of getting the Ascii value via 'Asc()' or converting values to thier hex value via conversion functions, but to no immdeate avail.
Any ideas on how to make proper comparisons on the code above?
Star
10562 Points
1997 Posts
Re: Need a method that removes illegal XML characters from a String
Oct 21, 2009 01:13 PM|Martin_Honnen|LINK
I think you need to use AscW to convert the character to an integer for the comparison and then ChrW to convert back to a character that you append:
Untested!
My blog
Star
12060 Points
2740 Posts
Re: Need a method that removes illegal XML characters from a String
Oct 21, 2009 02:24 PM|atconway|LINK
Yes the last (2) posts were very helpful - many thanks to Brent and Martin. Brent especially, thank you for converting that Java to C#, and Martin, the 'AscW' function was needed as shown below.
This link has the full expination as well:
http://allen-conway-dotnet.blogspot.com/2009/10/how-to-strip-illegal-xml-characters.html
Here is the VB.NET working version of removing illegal XML characters from a String:
None
0 Points
1 Post
Re: Need a method that removes illegal XML characters from a String
Dec 17, 2009 05:57 PM|haydenal|LINK
Any thoughts on how one might save the resulting string back off as an xml file?
Contributor
6062 Points
1895 Posts
Re: Need a method that removes illegal XML characters from a String
Dec 18, 2009 01:57 AM|kavita_khandhadia|LINK
string result = "<A/>";
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(result);
xDoc.Save("MyNewFile.xml");
I would love to change the world, but they wont give me the source code.
None
0 Points
4 Posts
Re: Need a method that removes illegal XML characters from a String
Oct 08, 2013 02:44 PM|El Bayames|LINK
I used a regular expression (see http://stackoverflow.com/questions/730133/invalid-characters-in-xml ). This below is the code.