Splitting strings with Pascal notation.

Last post 05-09-2008 10:54 AM by Svante. 17 replies.

Sort Posts:

  • Splitting strings with Pascal notation.

    05-08-2008, 5:28 PM
    • Loading...
    • JByrd2007
    • Joined on 07-25-2007, 3:08 PM
    • Greensboro, NC
    • Posts 29

    I have an enum with values composed of words (i.e. "BackToBack") and know that I can use the ToString() method to convert the enum to a string representation. However, when displaying this to the user, I want it appear as separate words (i.e. "Back To Back"). Basically, I want to do a Split, but I don't have any delimiter embedded (other than the fact that I want to split based on case).

    This seems like this might be a good case for Regular Expressions, but I"m not sure (and not fluent in these).

     I definitely don't want to walk the string checking each character's case and building that way (yuck).

     Has anyone done anything like this?

     Ideas?

     Thanks for your help!

    Jon

     

    If I was able to help, please mark this post as Answer.
  • Re: Splitting strings with Pascal notation.

    05-08-2008, 5:48 PM
    Answer
    • Loading...
    • johram
    • Joined on 06-13-2006, 6:36 AM
    • Sweden
    • Posts 1,864
    • Moderator
    If this post was useful to you, please mark it as answer. Thank you!
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 3:19 AM
    • Loading...
    • Svante
    • Joined on 02-12-2007, 12:15 PM
    • Stockholm, Sweden
    • Posts 1,548
    • Moderator

    JByrd2007:
     I definitely don't want to walk the string checking each character's case and building that way (yuck).

    Why yuck? Someone has to do it... A regex can certain do the job, but as you say - you're not fluent in these, and few others are so it'll be a piece of magic sitting in your code that will be hard to maintain and debug. Remember that regardless of the level of abstraction, some code will have to walk the string and check each characters case...

    The link provided by the other member seems to be a good compromise. Easy to read and maintain code, and you don't have to write it yourself.

    Svante
    AxCrypt - Free Open Source File Encryption & Online Password Manager - http://www.axantum.com
    [Disclaimer: Code snippets usually uncompiled, beware typos.]
    ______
    Don't forget to click "Mark as Answer" on the post(s) that helped you.
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 4:40 AM
    • Loading...
    • stmarti
    • Joined on 06-06-2006, 12:20 PM
    • Posts 539

    I would try with regular expressions (if you don't know ii, I highly recommend to learn it, there are excellent tutorials on the web)

     

    		string test = "PascalCaseSomethingMatchAlsoASingleCharacter";
    		Response.Write( test + "<br />");
    		System.Text.RegularExpressions.Regex pattern = new System.Text.RegularExpressions.Regex( "[A-Z][a-z]*" );
    		System.Text.RegularExpressions.Match result = pattern.Match( test );
    		while( result.Success )
    		{
    			Response.Write( result.Value + "<br />" );
    			result = result.NextMatch( );
    		}

    Anyway there could be problematic identifiers:

    What about this:  "thisstartwithcamelcaseWhatIsNow", you can use this pattern for this for example: "([a-z]+|[A-Z][a-z]*)"

    And what about his: "ThisContainsSomeIDHowToTreatThis", you can split this several ways: Some, ID, How or Some, IDH, ow Big Smile

  • Re: Splitting strings with Pascal notation.

    05-09-2008, 5:01 AM
    • Loading...
    • johram
    • Joined on 06-13-2006, 6:36 AM
    • Sweden
    • Posts 1,864
    • Moderator

    Why not just copy-paste the code in the article I provided? You're gonna have a lot of trouble getting this right with a regexp. Even I who tend to always look for a regexp solution as the first choice, would avoid to do so in this case.

     

    If this post was useful to you, please mark it as answer. Thank you!
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 6:20 AM
    • Loading...
    • stmarti
    • Joined on 06-06-2006, 12:20 PM
    • Posts 539

    I've just try to show an alternative way to the asker.

    For me a 3 line regexp solution is simpler and cleaner than the article (there is also a more capable regexp solution in the article's comments: http://secretgeek.net/progr_purga.asp)

  • Re: Splitting strings with Pascal notation.

    05-09-2008, 7:02 AM
    • Loading...
    • rjcox
    • Joined on 12-19-2007, 2:14 PM
    • Basingstoke, UK
    • Posts 869

    stmarti:
    "[A-Z][a-z]*"
     

    That'll only handle ASCII.. and too much work handling the results: 

    private static Regex LowThenUpRegex = new Regex(@"(\p{Ll})(\p{Lu})", RegexOptions.None);
    static string Convert(string input) {
      return LowThenUpRegex.Replace(input, "$1 $2");
    }
    
    static void Main(string[] args) {
      string[] test = new[] {
        "one",
        "Two",
        "OneTwo",
        "oneTwo",
        "onetwo",
        "OneTWO",
        "OneTwoThreeFourFive",
        "ÁbcÈdfghÏjklmnØp",
      };
    
      foreach (string t in test) {
        Console.WriteLine("\"{0}\" => \"{1}\"", t, Convert(t));
      }
    }
    \p{Ll} matches a lower-case letter, based on Unicode data, so works beyond the basic 26 ASCII letters. \p{Lu} similarly matches upper case. This can be seen in the last test case.
    Richard
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 7:08 AM
    • Loading...
    • johram
    • Joined on 06-13-2006, 6:36 AM
    • Sweden
    • Posts 1,864
    • Moderator

     

    stmarti:

    For me a 3 line regexp solution is simpler and cleaner than the article

    I totally agree with you. But the problem remains. You've gotta get the regexp right ;-)

    If this post was useful to you, please mark it as answer. Thank you!
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 7:34 AM
    • Loading...
    • stmarti
    • Joined on 06-06-2006, 12:20 PM
    • Posts 539

    rjcox:
    private static Regex LowThenUpRegex = new Regex(@"(\p{Ll})(\p{Lu})", RegexOptions.None);

    static string Convert(string input) { return LowThenUpRegex.Replace(input, "$1 $2"); }

     

    That is it! A 2 line regexp solution Big Smile 

  • Re: Splitting strings with Pascal notation.

    05-09-2008, 7:52 AM
    • Loading...
    • Svante
    • Joined on 02-12-2007, 12:15 PM
    • Stockholm, Sweden
    • Posts 1,548
    • Moderator

    stmarti:

    rjcox:
    private static Regex LowThenUpRegex=new Regex(@"(\p{Ll})(\p{Lu})",RegexOptions.None);static string Convert(string input){return LowThenUpRegex.Replace(input,"$1 $2");}
     
    That is it! A 2 line regexp solution

    Actually, it's a one line solution, and with no extra spaces either! Let's not waste these spaces and new lines, they are scarce I've heard ;-)

    Reducing the number of lines is seldom equivalent to improving the code quality...

    Regular expressions are far too complicated and terse in their syntax to make for good code in most cases. They are also limited in their cultural awareness, and are not really suitable for handing human-generated input. Just see how hard it is to get a correct regex for this simple problem. The original code posted as a link, is certainly longer - but at least 10 times as many developers are capable of maintaining that code for various upcoming requirements, perhaps:

    • If the string is in camel-case, convert it as it it was Pascal case.
    • If the string contains known 2-letters uppercase acronyms, such as ID or IP, handle these.
    • Handle the letter classification according to the current system or UI-culture.

    I'm not saying you can't do this with regular expressions - I'm saying that fewer developers can, and they will require more time to do it, and even fewer will understand the result.

    Write code firstly for other programmers to read, understand, test and maintain. Secondly, when proven by measurements, code specific parts for performance. Never code for elegance or terseness at the expense of clarity.

    Svante
    AxCrypt - Free Open Source File Encryption & Online Password Manager - http://www.axantum.com
    [Disclaimer: Code snippets usually uncompiled, beware typos.]
    ______
    Don't forget to click "Mark as Answer" on the post(s) that helped you.
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 8:05 AM
    • Loading...
    • stmarti
    • Joined on 06-06-2006, 12:20 PM
    • Posts 539

    I've just add one more line to it, like:

            string test = "IDSomeABCCamelCaseID";
            test = Regex.Replace( test, @"(\p{Ll})(\p{Lu})", "$1 $2" );
            test = Regex.Replace( test, @"(\p{Lu})(\p{Ll})", " $1$2" );

    This handle similar identifiers IDSomePascalABCCase -> ID Some Pascal ABC Case

    (Numbers, underscores etc. are treated as the part of a word:  123IDSome123PascalABC123IDCase123 -> 123ID Some123 Pascal ABC123ID Case123)

  • Re: Splitting strings with Pascal notation.

    05-09-2008, 8:11 AM
    • Loading...
    • stmarti
    • Joined on 06-06-2006, 12:20 PM
    • Posts 539

    Svante:

    I'm not saying you can't do this with regular expressions - I'm saying that fewer developers can, and they will require more time to do it, and even fewer will understand the result.

    Write code firstly for other programmers to read, understand, test and maintain. Secondly, when proven by measurements, code specific parts for performance. Never code for elegance or terseness at the expense of clarity.

     

    agree Smile 

  • Re: Splitting strings with Pascal notation.

    05-09-2008, 8:11 AM
    • Loading...
    • johram
    • Joined on 06-13-2006, 6:36 AM
    • Sweden
    • Posts 1,864
    • Moderator

    Svante:

    Write code firstly for other programmers to read, understand, test and maintain. Secondly, when proven by measurements, code specific parts for performance. Never code for elegance or terseness at the expense of clarity.

    Very good rule of thumb! In the light of this discussion, I think LINQ falls into this category. Being more of a complicator than a facilitator. OOps, I might have started a holy war here ;-)

    If this post was useful to you, please mark it as answer. Thank you!
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 9:42 AM
    • Loading...
    • rjcox
    • Joined on 12-19-2007, 2:14 PM
    • Basingstoke, UK
    • Posts 869

    Svante:
    Regular expressions are far too complicated and terse in their syntax to make for good code in most cases.
     

    Make that ...extreme cases and I could agree.

    While regular expressions are a DSL in their own right, they are very effective and efficient in their domain. If another language is a problem, then ASP.NET (needing C#, VB, HTML, CSS, JavaScript, SQL to start with) is not for you anyway.

    Svante:
    I'm not saying you can't do this with regular expressions - I'm saying that fewer developers can,
     

    The same comment would apply on replacing "regular expression" with any non-trivial technique or technology in use today.

    OTOH I regard anyone who absolutely rejects pologlot programming as incompetent, however good they are with their single language.

    Richard
  • Re: Splitting strings with Pascal notation.

    05-09-2008, 10:03 AM
    • Loading...
    • JByrd2007
    • Joined on 07-25-2007, 3:08 PM
    • Greensboro, NC
    • Posts 29

    I don't know how you say, "Awesome" in Swedish, but that was Awesome!

    Thanks for the help!

     Jon 

     

    If I was able to help, please mark this post as Answer.
Page 1 of 2 (18 items) 1 2 Next >