Regular Expressions 101

Community Patterns

preprocess text files to include only readable and text words

2

Regular Expression
Python

r"
^\W|^\d|^[A-Z ]+$|^[tT]able.+|^[Ff]igure.+|^[fF][aA][Xx].+|^[Ee]mail.+|^EMAIL.+|\.[\w]+\.|\d[\+\-\\\*\/]\d|[\(\)\+\-\\\*\/] [\(\)\+\-\\\*\/]
"
mg

Description

for large corpus cleaning

Submitted by aliabbas petiwala - 8 years ago