I love BBEdit on my Mac, but I was left scratching my head again today when I was trying to remember how to make its regex engine match a pattern across multiple lines. My hope was to extract a list of initial articles from a page that had HTML like this:
<table>
<tr>
<td valign="top" colspan="34" align="left">
am
</td>
<td valign="top" colspan="10" align="left">
Scottish Gaelic
</td>
</tr>
</table>
<table>
<tr>
<td valign="top" colspan="34" align="left">
an
</td>
<td valign="top" colspan="10" align="left">
English,
</td>
<td valign="top" colspan="10" align="left">
Irish,
</td>
<td valign="top" colspan="10" align="left">
Scots,
</td>
<td valign="top" colspan="10" align="left">
Scottish Gaelic,
</td>
<td valign="top" colspan="10" align="left">
Yiddish
</td>
</tr>
</table>
<table>
<tr>
<td valign="top" colspan="34" align="left">
an t-
</td>
<td valign="top" colspan="10" align="left">
Irish,
</td>
<td valign="top" colspan="10" align="left">
Scottish Gaelic
</td>
</tr>
</table>
Indeed, it has well over 100 tables like that, and I was looking for the contents of the first TD in each. The following regex does it:
(?s)[^<]*<table>[^<]*<tr>[^<]*<td[^>]*>([^<]*)</td>.*?</table>
The most significant part of this is the (?s)
at the beginning that tells BBEdit to match the pattern across line breaks. A more ninja-like regex assassin would probably be able to do it better, but this worked.