PHP5’s SimpleXML Now Passes CDATA Content

I didn’t hear big announcement of it, but deep in the docs (? PHP 5.1.0) you’ll find a note about additional Libxml parameters. In there you’ll learn about “LIBXML_NOCDATA,” and it works like this:

simplexml_load_string($xmlraw, 'SimpleXMLElement', LIBXML_NOCDATA);

Without that option (and with all previous versions of PHP/SimpleXML), SimpleXML just ignores any < ![CDATA[...]]> ‘escaped’ content, such as you’ll find in most every blog feed.

cdata, cdata in php, fixed, parsing rss, php, php5, rss, simplexml, xml

6 Comments

  1. Comment by Bjorn on April 14, 2006 1:11 am

    I wish more shared hosts would pick up PHP5. All of mine are still using PHP4 and I don’t have access to any of the new XML tools, and I’m afraid to write any thing with PHP4 because they may upgrade and break my scripts. It’s nice to see what there is, even if I can’t use it.

    [tags]php5[/tags]

  2. Comment by Bjorn on April 14, 2006 1:17 am

    Your comment system has some PHP errors:

    WordPress database error: [Duplicate entry '11257-php5' for key 2]
    INSERT INTO wp_bsuite_tags (`post_id`,`comment_id`,`tag`,`tag_raw`) VALUES (’11257′, 35263,’php5′, ‘php5′)

    Warning: Cannot modify header information - headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 55

    Warning: Cannot modify header information - headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 56

    Warning: Cannot modify header information - headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-comments-post.php on line 57

    Warning: Cannot modify header information - headers already sent by (output started at /home/mais04/public_html/blog/wp-includes/wp-db.php:102) in /home/mais04/public_html/blog/wp-includes/pluggable-functions.php on line 247

    [tags]php errors[/tags]

  3. Comment by Morgan on December 12, 2006 4:37 pm

    Googled looking for a solution to simpleXml skipping text wrapped in CDATA nodes. Got your page, and tossed in the libXml2 parameter. Worked great! Thanks bud!

  4. Comment by nico on March 6, 2007 9:32 am

    thanx a lot for my nerves with this article !

  5. Comment by James Constable on June 12, 2007 5:05 am

    Thanks. This was doing my head in for a couple of hours yesterday. Thought my xml was at fault until i found this article. Glad they’ve fixed it. Its a major oversight otherwise.

  6. Comment by Evan K on July 11, 2007 11:27 am

    For anyone who’s still stuck using PHP 5’s simplexml before v5.1.0 (like me), you can use a fairly simple regex to filter any cdata and collapse them into regular text nodes:

    function cdata_to_text($text) {
    $result = preg_replace(’/<!\[CDATA\[(.*?)\]\]>/ie’, "htmlentities(’\1′)", $text);
    $result = str_replace("\&quot;", "&quot;", $result);
    return $result;
    }

Comments RSS TrackBack Identifier URI

Leave a comment

 

User contributed tags for this post:

simplexml cdata (1204) - php simplexml cdata (351) - CDATA simplexml (166) - simplexmlelement cdata (147) - PHP CDATA (74) - simplexml php cdata (44) - simplexml load string php4 (42) - LIBXML NOCDATA (41) - simplexml_load_file cdata (39) - cdata php (38) - simple xml cdata (37) - SimpleXMLElement php4 (32) - simplexml load string CDATA (28) - cannot parse CDATA (26) - LIBXML_NOCDATA (23) - php4 simplexml load string (23) - cdata (21) - simplexml modify (20) - php5 CDATA (19) - cdata simplexml php (18) - F (18) - php simplexmlelement cdata (17) - simplexml ampersand (17) - simplexml CDATA php (17) - php LIBXML_NOCDATA (16) - libxml cdata simplexml (16) - php5 google earth (16) - simplexml and cdata (16) - php parser  (16) - php simplexml (15) - simplexml php4 (14) - CDATA in PHP (13) - php xml cdata (13) - preg_replace cdata (13) - simplexml libxml nocdata (12) - simple xml php CDATA (11) - Warning Cannot modify header information headers alread (11) - simplexml php (11) - cdata with simplexml (11) - php simplexml parse line by line (9) - php LIBXML NOCDATA (9) - SimpleXMLElement NOCDATA (9) - php4 SimpleXMLElement (8) - xml php cdata (8) - php5 header (8) - php5 simplexml CDATA (8) - simplexml_load_file php4 (8) - cdata in simplexml (8) - SimpleXML LIBXML_NOCDATA (8) - xml cdata simplexml (7) - simplexml flickr output (7) - php simplexml php4 (7) - php5 parse RSS (7) - php5 xml cdata (7) - simpleXML write CDATA (7) - php5 RSS (7) - simplexml write (7) - simple_xml CDATA (7) - simplexml with CDATA (7) - simpleXML modify xml (7) - php5 simplexml (7) - simple XML and CDATA (7) - php simplexml rss (7) - php simplexml cdata problem (6) - php simple xml cdata (6) - php5 Warning Cannot modify header information headers a (6) - php5 header xml (6) - php 5 1 cdata rss (6) - simplexml modify cdata (6) - simpleXML line breaks (6) - CDATA php5 (6) - php simplexml write cdata (6) - php cdata simplexml (6) - simplexml (6) - simplexml load string for php4 (6) - simplexml load string post (6) - how to read CDATA php (6) - php simplexml ![CDATA (6) - php simplexml and cdata (6) - php CDATA problem (6) - simplexml flickr (6) - simplexml_load_file LIBXML_NOCDATA (6) - simplexml nocdata (5) - cdata xml php simplexml (5) - ampersand simpleXML (5) - cdata and simplexml (5) - libxml cdata (5) - simplexml_load_file INSERT INTO mysql (5) - php5 xml cdata parse (5) - simplexml php  (5) - php simplexml search (5) - SimpleXMLElement and cdata (5) - php simplexml_load_file CDATA (5) - SimpleXML cdata comment (5) - simplexml load string cdata error (5) - LIBXML NOCDATA php (5) - simplexml PHP ampersand (5) - simplexml load content in php4 (5) - simplexml line break (5) - php4 xml cdata (5) - simplexml rss (5) - CDATA php simplexml (4) - simplexml cdata problem (4) - php simplexml ampersand (4) - SimpleXMLElement insert cdata (4) - simplexml output cdata (4) - php4 simplexml (4) - SimpleXMLElement load (4) - write CDATA Simplexml (4) - Simplexml Cannot modify header information (4) - php SimpleXML post (4) - SimpleXML CDATA Fixed (4) - simplexml et CDATA (4) - simplexml load string (4) - simplexml  (4) - simplexml ignores CDATA (4) - LIBXML_NOCDATA php (4) - CDATA xml php (4) - Inserting CDATA with simpleXML (4) - php4 cdata (4) - simplexml htmlentities (4) - simple xml php 5 rss simplexml load string (4) - php output CDATA xml (4) - SimpleXMLElement in php4 (4) - php 4 xml cdata (4) - regex cdata php (4) - php5 to php4 3 (3) - xml CDATA php (3) - cdata simplexml load string (3) - simple xml write cdata (3) - php parse xml cdata (3) - php5 6 SimpleXML (3) - SimpleXML parsing RSS (3) - php 5 simplexml modify (3) - cdata and php (3) - simplexmlelement LIBXML_NOCDATA (3) - simplexml php LIBXML NOCDATA (3) - CDATA content (3) - simple xml post php (3) - outputting CDATA content in XML (3) - simplexml insert into (3) - php5 parse xml cdata (3) - simple xml parse cdata (3) - simplexml insert (3) - php SimpleXML CDATA libxml (3) - simplexml_import_dom LIBXML_NOCDATA (3) - SimpleXMLElement on php4 (3) - cdata rss (3) - cdata xml and php4 (3) -