Regular Expressions 101

Community Patterns

Match each Unicode Characters, Symbols, and Emojis consecutively

2

Regular Expression
ECMAScript (JavaScript)

/
(?:\p{Extended_Pictographic}[\p{Emoji_Modifier}\p{M}]*(?:\p{Join_Control}\p{Extended_Pictographic}[\p{Emoji_Modifier}\p{M}]*)*|\s|.)\p{M}*
/
guy

Description

RegExp matches each (combined) Unicode symbol, character, or emoji consecutively.

Note: there are some edge cases where it does not combine some symbols that should be seen as one symbol (after Default Grapheme Cluster Boundary of UAX #29). For Unicode code point info see codepoints.net

Also, see this stackoverflow, which inspired me to make this RegExp (and I also needed it for a project), for Unicode symbol splitting with JavaScript strings.

Submitted by MAZ01001 - 7 months ago