Removing Special Characters

Rate It (1)

Last post 10-29-2007 1:25 AM by Samu Zhang. 4 replies.

Sort Posts:

  • Removing Special Characters

    10-24-2007, 3:13 PM
    • Loading...
    • mobfigr
    • Joined on 09-19-2005, 6:15 PM
    • Austin, TX
    • Posts 97

    Team,

    I've got a problem parsing special characters in XML. I was given a really badly formatted XML file that has a bunch of special characters - majority of these look like "squares" in notepad.

    I've already tried a bunch of different thing like setting the Encoding type to "UTF-8" and also inserting the "<?xml version="1.0" encoding="UTF-8"?>" line as the first line in the document.

    To set the encoding programmatically I did this by setting the StreamReader's property for encoding to UTF-8. Then, I use this to write XML data a file that I later load using the .Load() function. But this is where it tanks. It says - unrecognized character found which is the square like character. I can delete this character if I want - this is totally upto me. But how do I do this? How do I recognize that the character read in the stream was the square looking character and then delete?

    Any help would be much appreciated. Thanks.

    there can be no pact between lions and men...
    http://ireuben.net
  • Re: Removing Special Characters - Part One

    10-24-2007, 4:50 PM
    • Loading...
    • smcirish
    • Joined on 04-16-2007, 9:27 PM
    • Texas
    • Posts 184

    I had this problem.  It made the formatting really bad if the items with those special characters are printed.  It would help some if you knew exactly what the special character was.

    Here is code I used to determine what the special character was in position 4, of textbox2

            Dim pos_num As Integer
            Dim asc_num As Integer
            Dim str2 As String

          pos_num = 4
          str2 = TextBox2.Text.Substring(pos_num, 1)
          asc_num = Asc(str2)
      

    Then use this table to determine what the special character is.       http://www.asciitable.com/

    I was able to determine that most of my special characters were   

            TAB    Chr(9)
            LF     Chr(10)
            VT     Chr(11)
            FF     Chr(12)
            CR     Chr(13)

     

    -smcirish

     

     

    ~ Remember To Mark The Posts Which Helped You As The ANSWER ~
    Filed under: ,
  • Re: Removing Special Characters - Part Two

    10-24-2007, 5:02 PM
    Answer
    • Loading...
    • smcirish
    • Joined on 04-16-2007, 9:27 PM
    • Texas
    • Posts 184

    Then I found a function called Regex.Replace to find and clean out the bad characters.
    See Book: Visual Basic.Net Text Manipulation Handbook by Francois Liger, Craig McQueen, Paul Wilton

    http://www.amazon.com/Visual-Basic-NET-Manipulation-Handbook/dp/1861007302/ref=sr_1_1/105-6128760-9736442?ie=UTF8&s=books&qid=1193259222&sr=1-1

    Try this to clean data user enters in textbox2 before saving to the database.

            TextBox2.Text = Regex.Replace(TextBox2.Text, Chr(9), "")                   'TAB
            TextBox2.Text = Regex.Replace(TextBox2.Text, Chr(10), "")                  'LF
            TextBox2.Text = Regex.Replace(TextBox2.Text, Chr(11), "")                  'VT
            TextBox2.Text = Regex.Replace(TextBox2.Text, Chr(12), "")                  'FF
            TextBox2.Text = Regex.Replace(TextBox2.Text, Chr(13), "")                  'CR

    -smcirish

    ~ Remember To Mark The Posts Which Helped You As The ANSWER ~
    Filed under:
  • Re: Removing Special Characters - Part Two

    10-25-2007, 5:38 PM
    • Loading...
    • mobfigr
    • Joined on 09-19-2005, 6:15 PM
    • Austin, TX
    • Posts 97

    smcrish thanks for the posts - this is great, I'm going to try this tonight and leave a feedback on this asap. Thanks again.

    there can be no pact between lions and men...
    http://ireuben.net
  • Re: Removing Special Characters

    10-29-2007, 1:25 AM

    Hi mobfigr,

    As far as I know, this is encoding issue. If you set encoding='utf-8' , please check whether you save this xml file as utf-8 encode or not. They must be same.

     

    Sincerely,
    Samu Zhang
    Microsoft Online Community Support

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Page 1 of 1 (5 items)