I am trying to get a certain section (including html and spaces) using reg ex. My regex I will provide worked, but now there are line breaks/spaces and my way breaks. My way is also kind of sloppy, and i'm sure there is a cleaner way to acheive this..
Example text I am working with (I wish to capture any data any html tags in the Disease Specific Outcomes (this seciton will always have this text)):
<ol style="list-style-type: disc;">
<li>Disease Specific Outcomes:
<ol style="list-style-type: decimal;">
<li>text </li>
<li>text 2</li>
<li>text 3</li>
</ol>
</li>
</ol>
<ol style="list-style-type: disc;">
<li>General Outcome Measures
<ol style="list-style-type: decimal;">
<li>text </li>
<li>text </li>
<li>text </li>
<li>text </li>
</ol>
What I want to capture via the reg ex:
<ol style="list-style-type: disc;">
<li>Disease Specific Outcomes:
<ol style="list-style-type: decimal;">
<li>text </li>
<li>text 2</li>
<li>text 3</li>
</ol>
</li>
</ol>
I was using this, but now since there are line breaks the reg ex fails.
Regex r1 = new Regex("(<ol).*(Disease Specific Outcomes:).*?(General Outcome Measure)");
Regex r2 = new Regex(".+(</ol>)");
outcomeMeasures = r1.Match(outcomeMeasures).ToString();
outcomeMeasures = r2.Match(outcomeMeasures).ToString();
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
I'll be the first to admit I am horrible with reg expressions.
Your result is very close. Is there anyway to include all the <ol> tags and all ending </li></ol> tags that come before the section "General Specific Outcomes"?
Indeed it should be possible. Just expand the regular expression with "<ol[^>]*>[^<]*" before the start of the previous regular expression and append "[^<]*</ol>" to the end of the string. I can't verify this at the moment, but that should do the trick.
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
Marked as answer by theorytim.net on Oct 03, 2010 04:25 PM
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
Marked as answer by theorytim.net on Oct 04, 2010 06:59 PM
Just in case you want an attempt at an explanation of the regular expression, here you go:
"<ol" - Matches the literal string "<ol"
"[^>]*" - Matches any number of occurances (0 to many) of any character except ">" (will match all potential attributes inside the ol-tag)
">" - Matches the literal string ">" (will match the end of the attributes for the ol-tag
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
"<li>Disease Specific Outcomes:" - Matches the literal string.
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
"<ol" - Matches the literal string "<ol"
"[^>]*" - Matches any number of occurances (0 to many) of any character except ">" (will match all potential attributes inside the ol-tag)
">" - Matches the literal string ">" (will match the end of the attributes for the ol-tag
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
"(?:" - Starts an unnamed capture group
"<li>" - Matches the literal string "<li>"
".*(?=</li>)" - Matches any character until they are followed by the literal string "</li>" (will not include the </li> in the match, since it is a positive lookahead)
"</li>" - Matches the literal string "</li>"
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
")*" - Ends the capture group an makes it match any number of times (0 to many)
"</ol>" - Matches the literal string "</ol>"
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
"</li>" - Matches the literal string "</li>"
"[^<]*" - Matches any number of occurances of any character except "<" (will match all characters until the start of the next tag)
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
The detailed write out I just sat down and went through since I have the time. I really think I learned something with your help. I am anticipating the next occurence I will need to apply some reg ex to with my newfound confidence.
theoryTim.ne...
Member
53 Points
29 Posts
Regular Expression Help
Oct 01, 2010 11:30 PM|LINK
Hello all,
I am trying to get a certain section (including html and spaces) using reg ex. My regex I will provide worked, but now there are line breaks/spaces and my way breaks. My way is also kind of sloppy, and i'm sure there is a cleaner way to acheive this..
Example text I am working with (I wish to capture any data any html tags in the Disease Specific Outcomes (this seciton will always have this text)):
<ol style="list-style-type: disc;"> <li>Disease Specific Outcomes: <ol style="list-style-type: decimal;"> <li>text </li> <li>text 2</li> <li>text 3</li> </ol> </li> </ol> <ol style="list-style-type: disc;"> <li>General Outcome Measures <ol style="list-style-type: decimal;"> <li>text </li> <li>text </li> <li>text </li> <li>text </li> </ol> What I want to capture via the reg ex: <ol style="list-style-type: disc;"> <li>Disease Specific Outcomes: <ol style="list-style-type: decimal;"> <li>text </li> <li>text 2</li> <li>text 3</li> </ol> </li> </ol>I was using this, but now since there are line breaks the reg ex fails.
Regex r1 = new Regex("(<ol).*(Disease Specific Outcomes:).*?(General Outcome Measure)"); Regex r2 = new Regex(".+(</ol>)"); outcomeMeasures = r1.Match(outcomeMeasures).ToString(); outcomeMeasures = r2.Match(outcomeMeasures).ToString();Please help..
robert.weste...
Contributor
2352 Points
399 Posts
Re: Regular Expression Help
Oct 02, 2010 08:22 AM|LINK
Hi theoryTim.net,
The following regular expression should do the trick:
Hope it helps!
/Robert
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
theoryTim.ne...
Member
53 Points
29 Posts
Re: Regular Expression Help
Oct 02, 2010 05:16 PM|LINK
Thanks Robert..
I'll be the first to admit I am horrible with reg expressions.
Your result is very close. Is there anyway to include all the <ol> tags and all ending </li></ol> tags that come before the section "General Specific Outcomes"?
Examples I am getting:
<ol style="list-style-type: disc;">
<li>Disease Specific Outcomes:
<ol style="list-style-type: decimal;">
<li>Text text</li>
<li>Text text</li>
<li>Text text</li>
</ol>
</li>
</ol>
<ol style="list-style-type: disc;">
<li>General Outcome Measures
<ol style="list-style-type: decimal;">
<li>Admission Assessments Completed:
<ol style="list-style-type: lower-alpha;">
<li>Text text</li>
<li>Text text</li>
<li>Text text</li>
</ol>
</li>
</ol>
<p> </p>
I get this:
<li>Disease Specific Outcomes:
<ol style="list-style-type: decimal;">
<li>Text text</li>
<li>Text text</li>
<li>Text text</li>
</ol>
</li>
const string regEx = @"<li>Disease Specific Outcomes:\s*?<ol[^>]*>\s*(?:(?<myGroup><li>[^<]*</li>)[^<]*)*</ol>[^<]*</li>";Regex r1 = new Regex(regEx);
outcomeMeasures = r1.Match(outcomeMeasures).ToString();
Beginning and ending tags are missing. I highly appreciate your help, the reg ex you provided looks very sophiscated and beyond my editing ability.
robert.weste...
Contributor
2352 Points
399 Posts
Re: Regular Expression Help
Oct 03, 2010 01:15 AM|LINK
Hi theoryTim.net,
Indeed it should be possible. Just expand the regular expression with "<ol[^>]*>[^<]*" before the start of the previous regular expression and append "[^<]*</ol>" to the end of the string. I can't verify this at the moment, but that should do the trick.
Hope it helps!
/Robert
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
theoryTim.ne...
Member
53 Points
29 Posts
Re: Regular Expression Help
Oct 03, 2010 04:26 PM|LINK
Thank you Robert. Seems to work perfectly!
Now instead of worrying about a solution, I can try to understand one that works. Some day i'll be able to write reg ex's.
Thanks so much Robert!!!!!
theoryTim.ne...
Member
53 Points
29 Posts
Re: Regular Expression Help
Oct 03, 2010 06:17 PM|LINK
Sorry, but i do require your wisedom and assitance once again.
Sometimes there can be embedded tags within the <li> tags in which i'd like to capture too... example:
<ol style="list-style-type: disc;">
<li>Disease Specific Outcomes:
<ol style="list-style-type: decimal;">
<li>text text </li>
<li>text text </li>
<li><strong><span style="font-style: underline">text text.</span></strong></li>
</ol>
</li>
</ol>
I have had no luck using http://gskinner.com/RegExr/ in modifying the regex you provided. Any ideas on this one?
robert.weste...
Contributor
2352 Points
399 Posts
Re: Regular Expression Help
Oct 04, 2010 06:55 AM|LINK
Hi again,
Yes, you could use the following regular expression:
(I have removed the named group, since you did not seem to be using it.)
Hope it helps!
/Robert
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
robert.weste...
Contributor
2352 Points
399 Posts
Re: Regular Expression Help
Oct 04, 2010 08:19 AM|LINK
Hi,
Just in case you want an attempt at an explanation of the regular expression, here you go:
/Robert
Dont forget to click "Mark as Answer" on the post that helped you.
This credits that member, earns you a point and marks your thread as Resolved so we will all know you have been helped.
theoryTim.ne...
Member
53 Points
29 Posts
Re: Regular Expression Help
Oct 04, 2010 07:00 PM|LINK
I cannot thank you enough for this. This is gold in my eyes.
Btw, the alternate reg ex works as' well. Thank you Robert, thank you very much!
theoryTim.ne...
Member
53 Points
29 Posts
Re: Regular Expression Help
Oct 08, 2010 03:25 AM|LINK
Robert,
The detailed write out I just sat down and went through since I have the time. I really think I learned something with your help. I am anticipating the next occurence I will need to apply some reg ex to with my newfound confidence.
Thanks just one more time!