Parse HTML And Traverse DOM In PHP?

I spoke of this the other day, but now I’ve learned of PHP’s DOM functions, including loadHTML(). Use it in combination with simplexml_import_dom like this:

$dom = new domDocument;
$dom->loadHTML(’<ul><li>one</li><li>two</li><li>three<ul><li>sublist item</li></ul></li></ul>’);
if($dom){
$xml = simplexml_import_dom($dom);
print_r($xml);
}

This IBM developerWorks article has some more useful info.




Parse HTML And Traverse DOM In PHP?

I love how easily I can traverse an HTML document with jQuery, and I’d love to be able to do it in PHP. There are a few classes, but the PHP binding for Tidy seems to be where it’s at. The Zend dev pages make it look that way, anyway.

Apache, MySQL, and PHP on MacOS X

p0ps Harlow tweeted something about trying to get an AMP environment running on his Mac. Conversation followed, and eventually I sent along an email that look sorta like this:
If you’re running 10.4 (I doubt it, but it’s worth mentioning because I’m most familiar with it), here’s how I’ve setup dozens of machines for web development [...]

sifting results of error_log( …

sifting results of

error_log( $_SERVER[’REQUEST_URI’] ."\n". $_SERVER[’REMOTE_ADDR’] ."\n". print_r( debug_backtrace(), TRUE ) );

Mac OS X 10.5 Comes With Apache 2 and PHP 5

Yep. Leopard comes with new stuff. Lazeez says it works fine, but commenters here are having trouble.
leopard, Mac OS X 10.5, apache, php




Speedy PHP: Intermediate Code Caching

I’ve been working on MySQL optimization for a while, and though there’s still more to done on that front, I’ve gotten to the point where the the cumulative query times make up less than half of the page generation time.
So I’m optimizing code when the solution is obvious (and I hope to rope Zach into [...]

PHP Libraries for Collaborative Filtering and Recommendations

Daniel Lemire and Sean McGrath note that ?User personalization and profiling is key to many succesful Web sites. Consider that there is considerable free content on the Web, but comparatively few tools to help us organize or mine such content for specific purposes.? And they’ve written a paper and released prototype code on collaborative filtering.
Vogoo [...]

Parsing MARC Directory Info

I expected a record that looked like this:
LEADER 00000nas 2200000Ia 4500
001 18971047
008 890105c19079999mau u p 0uuua0eng
010 07023955 /rev
040 DLC|cAUG
049 PSMM
050 F41.5|b.A64
090 F41.5|b.A64
110 2 Appalachian Mountain Club
245 [...]

PHP Array To XML

I needed a quick, perhaps even sloppy way to output an array as XML. Some Googling turned up a few tools, including Simon Willison’s XmlWriter, Johnny Brochard’s Array 2 XML, Roger Veciana Associative array to XML, and Gijs van Tulder’s Array to XML. Finally, Gijs also pointed me to the XML_Serializer PEAR Package.
In an example [...]

Things I Need To Incorporate Into Various Projects

memcached, a ?highly effective caching daemon, …designed to decrease database load in dynamic web applications,? and the related PHP functions
pspell PHP functions related to aspell and this pspell overview from Zend
http_build_query, duh?
current connected mysql threads * unix load average = system busy; reduce operations when $system_busy > $x

development, memcached, mysql, [...]

Dang addslashes() And GPC Magic Quotes

Somewhere in the WordPress code extra slashes are being added to my query terms.
I’ve turned GPC magic quotes off via a php_value magic_quotes_gpc 0 directive in the .htaccess file (we have far too much legacy code that nobody wants to touch to turn it off site-wide). And I know my code is doing one run [...]

T2000 Unboxed And Online

My Sun T2000 is here, and with Cliff’s help it’s now patched, configured, and online. (Aside: what’s a Sun Happy Meal?)
I’ll second Jon’s assessment that Sun really should put some reasonable cable adapters in the box, as the the bundle of adapters necessary to make a null modem connection to the box is ridiculously out [...]

Solaris + AMP, ASAP

A Solaris sysadmin I’m not. But now that I’ve finally got the Sun T2000 server I begged for a while back, I’ve got to ramp it up right quick.
The first task is to get a, um, LAMP environment up and running (SAMP?…oh, Sun wants us to call it AMPS). A bit of Googling turned up [...]

PHP5’s SimpleXML Now Passes CDATA Content

I didn’t hear big announcement of it, but deep in the docs (? PHP 5.1.0) you’ll find a note about additional Libxml parameters. In there you’ll learn about ?LIBXML_NOCDATA,? and it works like this:
simplexml_load_string($xmlraw, ’SimpleXMLElement’, LIBXML_NOCDATA);
Without that option (and with all previous versions of PHP/SimpleXML), SimpleXML just ignores any < ![CDATA[...]]> ‘escaped’ content, such as you’ll find [...]

Performance Optimization

A couple notes from the past few days of tweaks and fixes:

Hyper-threading has a huge effect on LAMP performance. 
From now on, I’ll have bad dreams about running MySQL without Query Caching in the way that I used to have nightmares about going to school wearing only my underwear. The difference is that big. 
WordPress rocks, but [...]