I want to create a VB.NET function that will take a string containing accented characters and convert it into the non-accented equivalent. My particular requirement is associated with non-English names. Some real examples are Štimac, Hukić, Böttcher, Bjørnbakk,
Fürnrohr and Synnevåg. I would want to convert those examples to Stimac, Hukic, Bottcher, Bjornbakk, Furnrohr and Synnevag, respectively.
I am aware that a ü (with an umlaut accent) is often replaced by 'ue' but replacing it with u is better for my purposes. Similarly, I want to replace ö with o, rather than 'oe'. I am assuming that all the required conversions will need to be hard-coded in
the function.
Are there any slick ways to write such a function, ideally preserving the capitalisation in the original string?
Function c2(ByVal c1 As Char) As Char
Dim dcc As New Dictionary(Of Char, Char)
dcc.Add("ć"c, "c"c)
dcc.Add("ö"c, "o"c)
If dcc.ContainsKey(c1) Then
Return dcc(c1)
End If
Return c1
End Function
My particular requirement is associated with non-English names. Some real examples are Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg.
I have no idea for which language "Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg" are. But the Regular Expression may address your requirement: you can apply the regular expression to match the "Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and
Synnevåg" and then use corresponding words to replace them.
Please mark the replies as answers if they help or unmark if not.
Feedback to us
Function str2(ByVal s1 As String) As String
Dim s2 As String = ""
For i = 0 To s1.Length - 1
s2 += c2(s1(i))
Next
Return s2
End Function
Function c2(ByVal c1 As Char) As Char
Dim dcc As New Dictionary(Of Char, Char)
dcc.Add("ć"c, "c"c)
dcc.Add("ö"c, "o"c)
' more chars
If dcc.ContainsKey(c1) Then
Return dcc(c1)
End If
Return c1
End Function
Public Function RemoveAccents(ByVal Str As String) As String
Dim NormalisedString As String = Str.Normalize(NormalizationForm.FormD)
Dim SB As New StringBuilder
For Each ch As Char In NormalisedString
If CharUnicodeInfo.GetUnicodeCategory(ch) <> UnicodeCategory.NonSpacingMark Then
SB.Append(ch)
End If
Next
Return SB.ToString()
End Function
haggis999
Member
130 Points
611 Posts
Converting accented characters to non-accented equivalents
Feb 05, 2012 01:38 PM|LINK
I want to create a VB.NET function that will take a string containing accented characters and convert it into the non-accented equivalent. My particular requirement is associated with non-English names. Some real examples are Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg. I would want to convert those examples to Stimac, Hukic, Bottcher, Bjornbakk, Furnrohr and Synnevag, respectively.
I am aware that a ü (with an umlaut accent) is often replaced by 'ue' but replacing it with u is better for my purposes. Similarly, I want to replace ö with o, rather than 'oe'. I am assuming that all the required conversions will need to be hard-coded in the function.
Are there any slick ways to write such a function, ideally preserving the capitalisation in the original string?
David
saftrazink
Member
282 Points
74 Posts
Re: Converting accented characters to non-accented equivalents
Feb 06, 2012 01:17 AM|LINK
you can create a function like this
Function c2(ByVal c1 As Char) As Char Dim dcc As New Dictionary(Of Char, Char) dcc.Add("ć"c, "c"c) dcc.Add("ö"c, "o"c) If dcc.ContainsKey(c1) Then Return dcc(c1) End If Return c1 End Functionand add more chars to the dictionary
haggis999
Member
130 Points
611 Posts
Re: Converting accented characters to non-accented equivalents
Feb 06, 2012 09:00 AM|LINK
Hi saftrazink,
Thanks for the suggestion, but I'm really looking for a function that will process a string not just a single character.
For example, if such a function was called ReplaceAccentedChars then I would want ReplaceAccentedChars("Fürnrohr") to give a result of "Furnrohr".
David
haggis999
Member
130 Points
611 Posts
Re: Converting accented characters to non-accented equivalents
Feb 09, 2012 12:40 PM|LINK
Is there no one out there with any code suggestions? Is this not the best forum to ask such questions?
David
Mamba Dai - ...
All-Star
23531 Points
2683 Posts
Microsoft
Re: Converting accented characters to non-accented equivalents
Feb 12, 2012 01:39 PM|LINK
Hi,
I have no idea for which language "Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg" are. But the Regular Expression may address your requirement: you can apply the regular expression to match the "Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg" and then use corresponding words to replace them.
Feedback to us
Develop and promote your apps in Windows Store
saftrazink
Member
282 Points
74 Posts
Re: Converting accented characters to non-accented equivalents
Feb 13, 2012 08:28 PM|LINK
Function str2(ByVal s1 As String) As String Dim s2 As String = "" For i = 0 To s1.Length - 1 s2 += c2(s1(i)) Next Return s2 End Function Function c2(ByVal c1 As Char) As Char Dim dcc As New Dictionary(Of Char, Char) dcc.Add("ć"c, "c"c) dcc.Add("ö"c, "o"c) ' more chars If dcc.ContainsKey(c1) Then Return dcc(c1) End If Return c1 End FunctionAdrianParker
Member
20 Points
8 Posts
Re: Converting accented characters to non-accented equivalents
May 25, 2012 03:42 PM|LINK
Public Function RemoveAccents(ByVal Str As String) As String Dim NormalisedString As String = Str.Normalize(NormalizationForm.FormD) Dim SB As New StringBuilder For Each ch As Char In NormalisedString If CharUnicodeInfo.GetUnicodeCategory(ch) <> UnicodeCategory.NonSpacingMark Then SB.Append(ch) End If Next Return SB.ToString() End Function