Parse HTML string to obtain all images references?

Last post 08-17-2008 3:14 PM by EvoPrototype. 4 replies.

Sort Posts:

  • Parse HTML string to obtain all images references?

    08-09-2007, 1:03 PM
    • Loading...
    • Jungalist
    • Joined on 01-08-2004, 8:07 AM
    • London, Ontario
    • Posts 300

    I want to have some code run through a string of HTML source and find all of the image filenames. I will then take this list and copy the images from one folder to another. This is part of an application which allows the user to select certain pages from a site and make a CD version of them. Currently I just have the application copy all the files in the site's image folder into the folder with the CD files, but this could get out of hand as more pages get added to the site, or the user includes on of the pages that has an image gallery.

     

    My thought process has gone like this:

    ~ Remove all whitespace.
    ~ Look for the substring "images/" (all images will be in that folder)
    ~ copy the next characters up until a ' or a " into and array

    Once I have the array I can loop through it and copy all the image files. The thing is, I have a nagging feeling that there is a better way to do this, though I am not sure why i think that. Am I heading down the right path? Any advice is appreciated.

  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 1:50 PM
    Answer

    You should look into regular expressions. These offer much more flexibility and power than the vanilla string parsing functions.

    For example, the regular expression at http://regexlib.com/REDetails.aspx?regexp_id=1397 - <img[^>]* src=\"([^\"]*)\"[^>]*> - will find all <img> tags and group the src value, which is what you want. In short, using the above regular expression you could do what you want to do in about three lines of code.

    RegExLib.com is a great site to find common regular expression patterns along with information on how to use regular expressions in .NET.

    Happy Programming!


    -- Scott Mitchell
    -- mitchell@4guysfromrolla.com
    -- http://scottonwriting.net/sowblog/
    -- http://www.4GuysFromRolla.com/ScottMitchell.shtml
  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 2:35 PM
    • Loading...
    • Jungalist
    • Joined on 01-08-2004, 8:07 AM
    • London, Ontario
    • Posts 300

    Ahhh, I knew there had to be a more elegant way to do it, I just couldn't put my finger on it. Thanks for the info!

     

    <excited>
    Geez, I feel like a kid who just met a rock star - Scott Mitchell answered my post! Wink
    </excited>

  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 2:40 PM

    Jungalist:

    <excited>
    Geez, I feel like a kid who just met a rock star - Scott Mitchell answered my post! Wink
    </excited>

     

    Your profile pic helped. Wink I'm a big exploding dog fan, too. 

    Happy Programming!


    -- Scott Mitchell
    -- mitchell@4guysfromrolla.com
    -- http://scottonwriting.net/sowblog/
    -- http://www.4GuysFromRolla.com/ScottMitchell.shtml
  • Re: Parse HTML string to obtain all images references?

    08-17-2008, 3:14 PM
    Thanks


     

Page 1 of 1 (5 items)
Microsoft Communities
Page view counter