Matching Multi-line Regex in BBEdit

I love BBEdit on my Mac, but I was left scratching my head again today when I was trying to remember how to make its regex engine match a pattern across multiple lines. My hope was to extract a list of initial articles from a page that had HTML like this:

 
      <table>
         <tr>
            <td valign="top" colspan="34" align="left">am</td>
            <td valign="top" colspan="10" align="left">Scottish Gaelic</td>
         </tr>
      </table>
      <table>
         <tr>
            <td valign="top" colspan="34" align="left">an</td>
            <td valign="top" colspan="10" align="left">English,</td>
            <td valign="top" colspan="10" align="left">Irish,</td>
            <td valign="top" colspan="10" align="left">Scots,</td>
            <td valign="top" colspan="10" align="left">Scottish Gaelic,</td>
            <td valign="top" colspan="10" align="left">Yiddish</td>
         </tr>
      </table>
      <table>
         <tr>
            <td valign="top" colspan="34" align="left">an t-</td>
            <td valign="top" colspan="10" align="left">Irish,</td>
            <td valign="top" colspan="10" align="left">Scottish Gaelic</td>
         </tr>
      </table>

Indeed, it has well over 100 tables like that, and I was looking for the contents of the first TD in each. The following regex does it:

(?s)[^<]*<table>[^<]*<tr>[^<]*<td[^>]*>([^<]*)</td>.*?</table>

The most significant part of this is the (?s) at the beginning that tells BBEdit to match the pattern across line breaks. A more ninja-like regex assassin would probably be able to do it better, but this worked.

One thought on “Matching Multi-line Regex in BBEdit

  1. On windows, I use biterscripting to parse across multiple lines. Just read in the contents of the entire file into a string variable.

    For example, I have a file page.html. I want to extract a blcok starting at .

    var str content ; cat page.html > $content
    stex -r “^^” $content

    The above will extract the block across multiple lines.

    For example,

    stex -r “^^” “a\n\nd”

    will extract “” .

    Have fun with regular expressions.

    Patrick

Comments are closed.