Regular Expressions 101

Community Patterns

torstatus.blutmagie.de list parser

1

Regular Expression
Python

r"
<tr class='(.)'><td class='TR(.)'><a href='http://www\.openstreetmap\.org\/\?mlon=([-\d\.]+)&mlat=([-\d\.]+)&zoom=\d+' target='_blank'><img src='img\/flags\/(\w+)\.gif' class='flag' width='\d+px' title='(.+)' alt='\w+' border='0'><\/a>&nbsp;<a href='router_detail\.php\?FP=([a-zA-Z0-9]+)' target='_blank'>([^\<]+)<\/a><\/td><td class='TDb'><table cellspacing='0' cellpadding='0' class='bwb'><tr title='([^']+)'><td class='bwr.?'><img src='img/bar/\d+.png' width='\d+px' height='15px' alt='\d+'></td><td>&nbsp;<small>&nbsp;\d+</small></td></tr></table></td><td class='TDc.?'>([^<]+)</td><td class='TDS'><table class='iT'><tr><td class='iT'>([^\[]+)\[<a class='who' href='/cgi\-bin/whois.pl\?ip=[\d\.]+' target='_blank'>([\d\.]+)</a>\]</td>(<td><img src='img/status/Fast.png' title='Fast Server' alt='Fast Server'></td>)?(<td><img src='img/status/Exit.png' title='Exit Server' alt='Exit Server'></td>)?(<td><img src='img/status/Dir.png' title='Directory Server' alt='Directory Server'></td>)?(<td><img src='img/status/Guard.png' title='Guard Server' alt='Guard Server'></td>)?(<td><img src='img/status/Stable.png' title='Stable Server' alt='Stable Server'></td>)?(<td><img src='img/status/Authority.png' title='Authority Server' alt='Authority Server'/></td>)?<td><img src='[^']+' title='([^']+)' alt='[^']+'></td>(<td><img src='[^']+' title='([^']*)' alt='[^']*'></td>)?</tr></table></td><td class='TDc'>(<b>)?(\d*)(</b>)?</td><td class='TDc'>(<b>)?([None\d]*)(</b>)?</td><td class='(F\d)'></td></tr>
"
g

Description

parse most column for tor exit nodes.

Submitted by bandoche - 10 years ago