Regular Expressions 101

Community Patterns

Validate Wikipedia URL and extract Article Title


Regular Expression
ECMAScript (JavaScript)

^((http:|https:){0,1}\/\/.*\.wikipedia\.org){0,1}(\/wiki\/|\/w\/){1}(index\.php\?title=){0,1}(?!User:|Wikipedia:|WP:|Project:|File:|Image:|MediaWiki:|MW:|Template:|Help:|Category:|Portal:|Draft:|TimedText:|Module:|Gadget:|Gadget definition:|Topic:|Education Program:|Book:|WT:|Special:|Wikipedia Talk:|Talk:|H:|CAT:|User talk:|Image Talk:|MOS:|P:|T:|Main_Page)(?<ARTICLENAME>(?!index\.php\?title=)[^&|?|\n]*)[&|?]?.*$


This works against a list of hyperlinks from the 'USS Marmora (1862)' article

You can include File: pages by removing the File:| from the regex.

I could improve how the remaining query for urls containing index.php, for now I recommend splitting in your chosen programming language.

According to guidelines at

Submitted by Martin Shaw - 9 months ago (Last modified 9 months ago)