Last post Dec 08, 2014 12:07 AM by wim sturkenboom
Dec 07, 2014 12:50 PM|wim sturkenboom|LINK
I need to parse a simple string that ends with '_nn' (nn being 00 to 99 with leading zeroes for numbers below 10); e.g. abc_01 or abc_56
The problem is that I either get results in the first 2 groups (and the last two groups are empty) or in the last two groups (and the first two groups are empty. Is there a way to create a regexp that only returns two groups regardless of which part matches?
//edit: updated title
Dec 07, 2014 03:06 PM|gerrylowry|LINK
@wim sturkenb... Wim, for me, this is unclear because you need to show at least one example with more text so that we can get a better sense of the data that you
paraphrasing Paul Linton, if you need to use a regular expression to solve a problem, you have two problems.
i do not know whether this applies to your case, often rather than regex, i choose to use .String methods. http://msdn.microsoft.com/en-us/library/system.string_methods(v=vs.110).aspx TIMTOWTDI
one primitive approach could be to search for the underscore, inspect the character after this underscore for any of 0...9 and if found, inspect the following character likewise.; if that passes, then determine whether the three characters before the
underscore are alphabetic ... if true, you have found a string of the form xxx_nn.
again, it's hard to suggest an appropriate solution using regex, .String methods, or a combination of both because your data sample is just too small. FWIW
Dec 07, 2014 04:48 PM|Mikesdotnetting|LINK
If all you want to do is extract the number at the end of the string (assumed from the title of your post), the following will do it:
var input = "abc_56";
var number = Convert.ToInt32(input.Substring(input.IndexOf("_") + 1, 2));
Dec 07, 2014 05:39 PM|gerrylowry|LINK
Mike, the O.P. is expecting "two groups", whatever that means:
"... that only returns two groups regardless of which part matches"
FWIW, unless the O.P. can guarantee that only 00..99 will be at the end of a string, then if Wim requires an Int32, Int32.TryParse needs to be used. http://msdn.microsoft.com/en-us/library/f02979c7(v=vs.110).aspx "Int32.TryParse
Method (String, Int32)"
However, Wim did not explicitly state that the numeric part of the string is to be converted to Int32.
For that reason, Wim needs to show at least one example with more text so that we can get a better sense of the data to be parsed imho; if Wim intends to convert the 00..99 part to an integer, imho Wim should also mention that fact.
Dec 08, 2014 12:07 AM|wim sturkenboom|LINK
Sorry people for not being clear.
I need to split strings into two groups; the string consists of at least one character followed by an underscore followed by a two digit number (that includes leading zeroes if necessary).
When I split abc_01, I want to get back abc and 1. When I split
abcd123efgh_97, I want to get back abcd123efgh and 97. I use regular expressions with grouping to extract the data The given regular expression results in
|in | grp1 | grp2 | grp3 | grp4 |
|abc_01 |'abc' | '1' | '' | '' |
|xyz_97 |'' | '' | 'xyz'| '97' |
Where grp1..4 are the groups that the earlier regular expression returns (actually it also returns a grp0 for the complete match). The single quotes are not part of the result and just for display purposes.
What I'm asking for is a regexp that always returns the result in grp1 and grp2 independent of the number at the end.
I have found a regexp that nearly does it, and I'm happy to use it, so from that perspective the thread is solved. Only disadvantage is that it does not limit the number to two digits (so wim_1234567 is considered valid while it is not); I however can work
around that. But if you can come up with a better one, you're welcome.
I like to explain why I use regular expressions.
I big part of my life I spend on writing small tools to process (fixed width, csv and xml) text files; processing usually consists of re-formatting of fields, recombining (parts) of fields into other fields and moving fields around. I'm also crazy about
flexibility and hate hard-coding; therefore the regular expression is stored in a configuration file. If the format of the input string ever changes, it's a matter of changing the regular expression to let the program do what it needs to do instead of modifying
In the project that I'm working on, a (simplified) configuration entry looks like
<FieldName>The first field</FieldName>
In this case, the program will read a field in a record (position 123, 12 characters max), parse the information into an array using the inputformat (the regular expression) and write a modified output defined in outputformat. The output format is used as
the format specifier in C#'s String.Format while the resulting array of the parsing is used as the object array in the String.Format. If the content needs to be organised differently, simply change the outputformat, if the input format changes, (sometimes
not so simply, hence the question) change the inputformat.
I hope this explains why I use regular expressions.