Dave's Brain

Browse - Programming Tips - In perl, how can I parse XML and HTML without using a libary or module?

Date: 2010jun29
Language: perl

Q.  In perl, how can I parse XML and HTML without using a libary or module?

A.  Use non-greedy matches.   And the s g options.

For example, if you have a RSS feed which is XML with
multiple <item>'s.  Do this:

@a = $content =~ m|\<item\>(.*?)\</item\>|sg; foreach $i (@a) { print "item=$i\n"; }
Here is what's happening: We use | to delimit the match so we don't have to escape the / The .* matches any character(s). Adding the ? makes it non-greedy so we get each <item> at a time. Because <item> is any characters. The s option folds several lines together. So this works if there are newlines. The g option gets all (global) matches.

Add a comment

Sign in to add a comment
Copyright © 2008-2017, dave - Code samples on Dave's Brain is licensed under the Creative Commons Attribution 2.5 License. However other material, including English text has all rights reserved.