Regular Expressions 101

Community Patterns

Validate Wikipedia URL and extract Article Title

1

Regular Expression
ECMAScript (JavaScript)

/
^((http:|https:){0,1}\/\/.*\.wikipedia\.org){0,1}(\/wiki\/|\/w\/){1}(index\.php\?title=){0,1}(?!User:|Wikipedia:|WP:|Project:|File:|Image:|MediaWiki:|MW:|Template:|Help:|Category:|Portal:|Draft:|TimedText:|Module:|Gadget:|Gadget definition:|Topic:|Education Program:|Book:|WT:|Special:|Wikipedia Talk:|Talk:|H:|CAT:|User talk:|Image Talk:|MOS:|P:|T:|Main_Page)(?<ARTICLENAME>(?!index\.php\?title=)[^&|?|\n]*)[&|?]?.*$
/
miug

Description

This works against a list of hyperlinks from the 'USS Marmora (1862)' article

You can include File: pages by removing the File:| from the regex.

I could improve how the remaining query for urls containing index.php, for now I recommend splitting in your chosen programming language.

According to guidelines at https://en.wikipedia.org/wiki/Help:URL

Submitted by Martin Shaw - 8 months ago (Last modified 8 months ago)