Parse HTML And Traverse DOM In PHP?

I spoke of this the other day, but now I’ve learned of PHP’s DOM functions, including loadHTML(). Use it in combination with simplexml_import_dom like this:

$dom = new domDocument;
$dom->loadHTML('<ul><li>one</li><li>two</li><li>three<ul><li>sublist item</li></ul></li></ul>');
if($dom){
	$xml = simplexml_import_dom($dom);
	print_r($xml);
}

This IBM developerWorks article has some more useful info.

Here’s some code I prototyped to parse out the ISBNs and LCCN (or any data, really) from an an average record in Scriblio:

$dom = new domDocument;
$dom->loadHTML($content);
if($dom){
	$xml = simplexml_import_dom($dom);
}
foreach($xml->body->ul->li as $thing){
	if($thing['class'] == 'isbn'){
		foreach($thing->ul->li as $stuff){
			print_r($stuff);
		}
	}
	if($thing['class'] == 'lccn'){
		foreach($thing->ul->li as $stuff){
			print_r($stuff);
		}
	}
}