Regular Expressions 101

Community Patterns

Select Chapters and Bad Headers

1

Regular Expression
PCRE2 (PHP >=7.3)

/
^((Chapter|Section|\w{3}logue|Interlude|Number|Part|Act)? ?([0-9]*\.?|[\W]+|(Ten|Eleven|Twelve|\w{3,5}teen|((Twen|Thir|Four|Fif|Six|Seven|Eigh|Nine)ty)?( |\-)?(One|Two|Three|Four|Five|Six|Seven|Eight|Nine)?)))[^\n]? ?\W?\s*\n
/
mgi

Description

Selects and optionally replaces repetitive chapters, headers, and other sections with a dinkus. Does not select unique chapter titles.

Useful for quickly cleaning literary datasets for finetuning a model or module. Originally designed for use with NovelAI.

Submitted by lion - 3 years ago (Last modified 3 years ago)