Parse HTML string to obtain all images references?

Last post 08-17-2008 3:14 PM by EvoPrototype. 4 replies.

Sort Posts:

  • Parse HTML string to obtain all images references?

    08-09-2007, 1:03 PM
    • Participant
      1,507 point Participant
    • Jungalist
    • Member since 01-08-2004, 8:07 AM
    • London, Ontario
    • Posts 311

    I want to have some code run through a string of HTML source and find all of the image filenames. I will then take this list and copy the images from one folder to another. This is part of an application which allows the user to select certain pages from a site and make a CD version of them. Currently I just have the application copy all the files in the site's image folder into the folder with the CD files, but this could get out of hand as more pages get added to the site, or the user includes on of the pages that has an image gallery.

     

    My thought process has gone like this:

    ~ Remove all whitespace.
    ~ Look for the substring "images/" (all images will be in that folder)
    ~ copy the next characters up until a ' or a " into and array

    Once I have the array I can loop through it and copy all the image files. The thing is, I have a nagging feeling that there is a better way to do this, though I am not sure why i think that. Am I heading down the right path? Any advice is appreciated.

  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 1:50 PM
    Answer
    • Contributor
      4,100 point Contributor
    • Scott Mitchell
    • Member since 06-15-2002, 8:41 PM
    • San Diego, CA
    • Posts 707
    • ASPInsiders
      TrustedFriends-MVPs

    You should look into regular expressions. These offer much more flexibility and power than the vanilla string parsing functions.

    For example, the regular expression at http://regexlib.com/REDetails.aspx?regexp_id=1397 - <img[^>]* src=\"([^\"]*)\"[^>]*> - will find all <img> tags and group the src value, which is what you want. In short, using the above regular expression you could do what you want to do in about three lines of code.

    RegExLib.com is a great site to find common regular expression patterns along with information on how to use regular expressions in .NET.

    Happy Programming!


    -- Scott Mitchell
    -- mitchell@4guysfromrolla.com
    -- http://scottonwriting.net/sowblog/
    -- http://www.4GuysFromRolla.com/ScottMitchell.shtml
  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 2:35 PM
    • Participant
      1,507 point Participant
    • Jungalist
    • Member since 01-08-2004, 8:07 AM
    • London, Ontario
    • Posts 311

    Ahhh, I knew there had to be a more elegant way to do it, I just couldn't put my finger on it. Thanks for the info!

     

    <excited>
    Geez, I feel like a kid who just met a rock star - Scott Mitchell answered my post! Wink
    </excited>

  • Re: Parse HTML string to obtain all images references?

    08-09-2007, 2:40 PM
    • Contributor
      4,100 point Contributor
    • Scott Mitchell
    • Member since 06-15-2002, 8:41 PM
    • San Diego, CA
    • Posts 707
    • ASPInsiders
      TrustedFriends-MVPs

    Jungalist:

    <excited>
    Geez, I feel like a kid who just met a rock star - Scott Mitchell answered my post! Wink
    </excited>

     

    Your profile pic helped. Wink I'm a big exploding dog fan, too. 

    Happy Programming!


    -- Scott Mitchell
    -- mitchell@4guysfromrolla.com
    -- http://scottonwriting.net/sowblog/
    -- http://www.4GuysFromRolla.com/ScottMitchell.shtml
  • Re: Parse HTML string to obtain all images references?

    08-17-2008, 3:14 PM
    • Member
      17 point Member
    • EvoPrototype
    • Member since 08-12-2008, 2:36 PM
    • Posts 42
    Thanks


     

Page 1 of 1 (5 items)