PHP5’s SimpleXML Now Passes CDATA Content

I didn’t hear big announcement of it, but deep in the docs (? PHP 5.1.0) you’ll find a note about additional Libxml parameters. In there you’ll learn about “LIBXML_NOCDATA,” and it works like this:

simplexml_load_string($xmlraw, 'SimpleXMLElement', LIBXML_NOCDATA);

Without that option (and with all previous versions of PHP/SimpleXML), SimpleXML just ignores any < ![CDATA[...]]> ‘escaped’ content, such as you’ll find in most every blog feed.

cdata, cdata in php, fixed, parsing rss, php, php5, rss, simplexml, xml

7 thoughts on “PHP5’s SimpleXML Now Passes CDATA Content

  1. I wish more shared hosts would pick up PHP5. All of mine are still using PHP4 and I don’t have access to any of the new XML tools, and I’m afraid to write any thing with PHP4 because they may upgrade and break my scripts. It’s nice to see what there is, even if I can’t use it.

    [tags]php5[/tags]

  2. Your comment system has some PHP errors:

    WordPress database error: [Duplicate entry ‘11257-php5′ for key 2]
    INSERT INTO wp_bsuite_tags (`post_id`,`comment_id`,`tag`,`tag_raw`) VALUES (‘11257′, 35263,’php5′, ‘php5′)

    Warning: Cannot modify header information – headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 55

    Warning: Cannot modify header information – headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 56

    Warning: Cannot modify header information – headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 57

    Warning: Cannot modify header information – headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-includes/pluggable-functions.php on line 247

    [tags]php errors[/tags]

  3. Googled looking for a solution to simpleXml skipping text wrapped in CDATA nodes. Got your page, and tossed in the libXml2 parameter. Worked great! Thanks bud!

  4. Thanks. This was doing my head in for a couple of hours yesterday. Thought my xml was at fault until i found this article. Glad they’ve fixed it. Its a major oversight otherwise.

  5. For anyone who’s still stuck using PHP 5’s simplexml before v5.1.0 (like me), you can use a fairly simple regex to filter any cdata and collapse them into regular text nodes:

    function cdata_to_text($text) {
    $result = preg_replace(‘/<!\[CDATA\[(.*?)\]\]>/ie’, "htmlentities(‘\1′)", $text);
    $result = str_replace("\&quot;", "&quot;", $result);
    return $result;
    }

Comments are closed.