Need a method that removes illegal XML characters from a String

Last post 12-18-2009 1:57 AM by kavita_khandhadia. 6 replies.

Sort Posts:

  • Need a method that removes illegal XML characters from a String

    10-21-2009, 9:28 AM
    • Contributor
      6,819 point Contributor
    • atconway
    • Member since 09-24-2007, 9:20 PM
    • Florida U.S.A
    • Posts 1,367

     The other day I came across the following exception:

    "Response is not well-formed XML  System.Xml.XmlException: '', hexadecimal value 0x13, is an invalid character."

    This obviously occurred from an illegal character in something I was sending to a web service.  I found that the best way for my application to prevent this was to remove any such characters long before I send the data to the web service.

    I found the following link, which I seemed like the solution....the only problem is the code is in Java and I can not get it converted.(http://benjchristensen.com/2008/02/07/how-to-strip-invalid-xml-characters/)

    ...Then I found some .net code links claiming code for this issue, but I didn't really like the implementation.  So here is my question:  Does anyone have a decent method they could post that takes an input 'String' and checks to remove any illegal XML characters?

    Thank you! Smile

    Thank you,   >[Blog]<

    "The best thing about a boolean is even if you are wrong, you are only off by a bit." :D
    -anonymous

  • Re: Need a method that removes illegal XML characters from a String

    10-21-2009, 11:22 AM
    Answer
    • Participant
      958 point Participant
    • Brent Jenkins
    • Member since 11-07-2007, 10:43 PM
    • UK
    • Posts 177

    Here's a C# conversion of the Java code you posted a link to (you'll need to add "using System.Text;" at the top of your C# file for the StringBuilder class):

             public String stripNonValidXMLCharacters(string textIn) 
            {
                StringBuilder textOut = new StringBuilder(); // Used to hold the output.
                char current; // Used to reference the current character.
    
                if (textIn == null || textIn == string.Empty) return string.Empty; // vacancy test.
                for (int i = 0; i < textIn.Length; i++) {
                    current = textIn[i]; 
    
                    if ((current == 0x9 || current == 0xA || current == 0xD) ||
                        ((current >= 0x20) && (current <= 0xD7FF)) ||
                        ((current >= 0xE000) && (current <= 0xFFFD)) ||
                        ((current >= 0x10000) && (current <= 0x10FFFF)))
                    {
                        textOut.Append(current);
                    }
                }
                return textOut.ToString();
            }   


     

    Brent Jenkins
    Director, Vale Web Design Ltd
    Web: www.valewebdesign.co.uk
    LinkedIn: http://www.linkedin.com/in/valewebdesign
  • Re: Need a method that removes illegal XML characters from a String

    10-21-2009, 12:26 PM
    • Contributor
      6,819 point Contributor
    • atconway
    • Member since 09-24-2007, 9:20 PM
    • Florida U.S.A
    • Posts 1,367

    Brent Jenkins:
  • if ((current == 0x9 || current == 0xA || current == 0xD) ||   
  •             ((current >= 0x20) && (current <= 0xD7FF)) ||   
  •             ((current >= 0xE000) && (current <= 0xFFFD)) ||   
  •             ((current >= 0x10000) && (current <= 0x10FFFF)))
  •  

    I am having difficulty getting the above line converted to VB.NET.  See C# and VB.NET do character to integer conversion a little differently.  What I end up with is the following design time error:

    " Operator '=' is not defined for types 'Char' and 'Integer'"

    The code by the way in VB.NET for that line is as follows:

                    If (current = &H9 OrElse current = &HA OrElse current = &HD) OrElse ((current >= &H20) AndAlso (current <= &HD7FF)) OrElse ((current >= &HE000) AndAlso (current <= &HFFFD)) OrElse ((current >= &H10000) AndAlso (current <= &H10FFFF)) Then
                        textOut.Append(current)
                    End If

    Now I have been trying some combinations of getting the Ascii value via 'Asc()' or converting values to thier hex value via conversion functions, but to no immdeate avail. 

    Any ideas on how to make proper comparisons on the code above?

     

    Thank you,   >[Blog]<

    "The best thing about a boolean is even if you are wrong, you are only off by a bit." :D
    -anonymous

  • Re: Need a method that removes illegal XML characters from a String

    10-21-2009, 1:13 PM
    Answer

    I think you need to use AscW to convert the character to an integer for the comparison and then ChrW to convert back to a character that you append:

        Public Function stripNonValidXMLCharacters(ByVal textIn As String) As [String]
            Dim textOut As New StringBuilder()
            ' Used to hold the output.
            Dim current As Integer
            ' Used to reference the current character.
            If textIn Is Nothing OrElse textIn = String.Empty Then
                Return String.Empty
            End If
            ' vacancy test.
            For i As Integer = 0 To textIn.Length - 1
                current = AscW(textIn(i))
    
                If (current = &H9 OrElse current = &HA OrElse current = &HD) OrElse ((current >= &H20) AndAlso (current <= &HD7FF)) OrElse ((current >= &HE000) AndAlso (current <= &HFFFD)) OrElse ((current >= &H10000) AndAlso (current <= &H10FFFF)) Then
                    textOut.Append(ChrW(current))
                End If
            Next
            Return textOut.ToString()
        End Function

    Untested!

    Martin Honnen --- MVP XML
    My blog
  • Re: Need a method that removes illegal XML characters from a String

    10-21-2009, 2:24 PM
    Answer
    • Contributor
      6,819 point Contributor
    • atconway
    • Member since 09-24-2007, 9:20 PM
    • Florida U.S.A
    • Posts 1,367

    Yes the last (2) posts were very helpful - many thanks to Brent and Martin.  Brent especially, thank you for converting that Java to C#, and Martin, the 'AscW' function was needed as shown below.

    This link has the full expination as well:

    http://allen-conway-dotnet.blogspot.com/2009/10/how-to-strip-illegal-xml-characters.html

    Here is the VB.NET working version of removing illegal XML characters from a String:

        Public Shared Function RemoveIllegalXMLCharacters(ByVal Content As String) As String
    
            'Used to hold the output.
            Dim textOut As New StringBuilder()
            'Used to reference the current character.
            Dim current As Char
            'Exit out and return an empty string if nothing was passed in to method
            If Content Is Nothing OrElse Content = String.Empty Then
                Return String.Empty
            End If
    
            'Loop through the lenght of the content (1) character at a time to see if there
            'are any illegal characters to be removed:
            For i As Integer = 0 To Content.Length - 1
                'Reference the current character
                current = Content(i)
                'Only append back to the StringBuilder valid non-illegal characters
                If (AscW(current) = &H9 OrElse AscW(current) = &HA OrElse AscW(current) = &HD) _
                   OrElse ((AscW(current) >= &H20) AndAlso (AscW(current) <= &HD7FF)) _
                   OrElse ((AscW(current) >= &HE000) AndAlso (AscW(current) <= &HFFFD)) _
                   OrElse ((AscW(current) >= &H10000) AndAlso (AscW(current) <= &H10FFFF)) Then
                    textOut.Append(current)
                End If
            Next
    
            'Return the screened content with only valid characters
            Return textOut.ToString()
    
        End Function


     

    Thank you,   >[Blog]<

    "The best thing about a boolean is even if you are wrong, you are only off by a bit." :D
    -anonymous

  • Re: Need a method that removes illegal XML characters from a String

    12-17-2009, 5:57 PM
    • Member
      2 point Member
    • haydenal
    • Member since 12-17-2009, 10:53 PM
    • Posts 1

    Any thoughts on how one might save the resulting string back off as an xml file?

  • Re: Need a method that removes illegal XML characters from a String

    12-18-2009, 1:57 AM
    Answer

    string result = "<A/>";

    XmlDocument xDoc = new XmlDocument();

    xDoc.LoadXml(result);

    xDoc.Save("MyNewFile.xml");

    Please mark this post as Answer if it is of help to you!

    " Every wall is a door..! "
Page 1 of 1 (7 items)