Community Patterns

1

ตรวจสอบพยัญชนะต้นตัวสะกดสระและวรรณยุกต์ไทย

Created·2026-01-22 01:36
Updated·2026-01-23 12:42
Flavor·ECMAScript (JavaScript)
ตรวจสอบพยัญชนะต้น (ต้องมี) ตรวจตัวสะกดสำหรับสระที่ต้องมี ตรวจสอบการวางสระและวรรณยุกต์ไทย หมายเหตุ การตรวจสอบตัวสะกดในภาษาไทยตรวจสอบได้ยากเพราะภาษาไทยเป็นภาษาที่เขียนติด ๆ กันไม่มีการแบ่งคำอย่างชัดเจนทำให้การอ่านภาษาไทยผู้อ่านต้องใช้ความหมายของคำในการตัดสินการอ่านแบ่งคำตามความเหมาะสมเช่นคำว่า "ตากลม" อาจอ่านเป็น "ตาก-ลม" ก็ได้ หรืออ่านเป็น "ตา-กลม"ก็ได้ ดังนั้นการเขียน Regex เพื่อทำการตรวจสอบอาจช่วยได้ระดับหนึ่ง อ่าจมีผิดบ้างถูกบ้าง แต่ก็ถือว่าเป็นเครื่องมือที่ใช้ช่วยเหลือในการตรวจสอบเพิ่มเติมได้ 80% ของความเป็นไปใด้ก็แล้วกันนะครับ หวังว่าการเขียนเพิ่มเติมส่วนนี้ จะมีประโยชน์บ้างไม่มากก็น้อย
Submitted by อธิปัตย์ ล้อวงศ์งาม
1

Regex for Matching Documentation Websites

Created·2024-11-24 01:45
Flavor·ECMAScript (JavaScript)
Regex for Matching Documentation Websites This repository contains a powerful regular expression designed to match URLs that commonly point to documentation-related websites. The regex is optimized for flexibility, covering various terms and URL patterns. Regex Pattern ^.(?:\.|\/)(docs|documentation|help|guide|manual|reference|api|kb|support|resources|wiki|developer|how-to|tutorials|examples|learn|instructions)(?:\.|\/)?.$ Purpose This regex is intended to identify URLs that contain keywords associated with documentation or support websites. It handles common patterns in subdomains, directories, and file paths. Explanation ^.*: Matches any characters at the beginning of the URL (any prefix). (?:\.|\/): Matches either a period (.) or a forward slash (/) preceding the keyword. (docs|documentation|help|guide|manual|...): Matches any of the keywords listed in the group. (?:\.|\/)?: Allows an optional period (.) or forward slash (/) following the keyword. .*$: Matches any characters following the keyword (any suffix). Examples Positive Examples The following URLs should match the regex: https://example.com/docs http://docs.example.com https://example.com/documentation https://sub.domain.com/docs/index.html https://example.com/help https://api.example.com/docs http://example.com/manual/index.html https://wiki.example.com http://developer.example.com/guide https://example.com/tutorials/docs/page https://kb.example.com/docs/tutorial.html https://example.com/resources/documentation/tutorial.html http://example.com/reference/help/documentation.html https://developer.example.com/docs/tutorials/index.html http://support.example.com/documentation/overview https://resources.example.com/docs/v1/tutorial https://example.com/how-to/documentation http://example.com/api/reference/docs https://example.com/reference/v2/index.html http://example.com/docs/resources/api.html Negative Examples The following URLs should not match the regex: https://example.com/documentary http://helpful.example.com https://manuals.example.com http://example.com/references https://example.com/resourceful http://example.com/wiki-books https://apiary.example.com http://example.com/documents http://example.com/documentable https://help-center.example.com http://manual.example.com/docsystem https://example.com/resourcesful http://api.example.comary https://example.net/instructions-v1 http://example.org/learned-tutorial http://example.com/support-center Author Jeremy Georges-Filteau Website Github
Submitted by jgeofil

Community Library Entry

1

Regular Expression
Created·2024-06-14 22:32
Flavor·ECMAScript (JavaScript)

/
((?![\u{23}-\u1F6F3]([^\u{FE0F}]|$))\p{Emoji}(?:(?!\u{200D})\p{EComp}|(?=\u{200D})\u{200D}\p{Emoji})*)
/
gmu
Open regex in editor

Description

Purpose

I wanted to make a regex that just works. This is it for ECMAScript engines.

Capabilities

Matches all 5024 Emoji specified in the official Unicode website's emoji-test.txt as of (6/14/2024, thank you not, Apple Intelligence).

This regex also fails glyphs which must be part of grapheme cluster but are solitary (more on this in the "Implementation" section)

-- These and similar fail
1 2 3 4 5 6 7 8 9 # * ‼ ↔

-- These succeed
Basic: 😀
Basic + Modifier: 🦸🏾
Basic + ZWJ + Basic: 🐦‍🔥
Basic + Modifier + ZWJ + Basic +: ❤️‍🔥
Basic + ZWJ + Basic + Modifier: 🐻‍❄️
Basic + Modifier + ZWJ + Basic + Modifier + ZWJ + Basic + ZWJ + Basic + Modifier: 👩🏼‍❤️‍💋‍👩🏿

Where ZWJ means "Zero-Width Joiner," a unicode character U+200D which allows composition between two separate emojis, e.g:

😮‍💨 = 😮 (U+1F62E) + (U+200D) + 💨 (U+1F4A8)

Implementation

In order to make this expression robust against new emojis being created, I used the inherent Unicode structure of emojis to validate the string.

Emojis have the following structure:

-- BEGIN

\p{Emoji} -- Class of basic, single-character emoji

-- BEGIN Optional Section

-- Case 1: Arbitrary amount of Non-ZWJ Modifier (skin, hair, simple-grapheme modifier, etc)
-- < negative look ahead for ZWJ >

\p{Emoji_Component}+

-- Case 2: ZWJ followed by basic emoji
-- < check for ZWJ >

\p{Emoji} -- We've composed a new emoji!

-- END Optional Section

-- * Repeat the optional section as many times as possible to get the longest chain of emojis joined by ZWJs

-- END

* The emojis defined by \p{Emoji} also contains characters that are not generally considered emojis like © or ❄, ✔. These glyphs may even be used as to compose new emojis as in the case of

🏋‍♂ = 🏋 (U+1F3CB) + (U+200D) + ♂ (U+2642)

Without being part of a larger grapheme cluster, this regex fails these glyphs. That's what the first negative lookahead checks: If you come across one of these glyphs, ensure that the following glyph is a specific variation code point (U+FE0F) they must have.

This variation is what turns ✔ into ✔️.

Also of note, there also some glyphs in this range which do act as conventional emojis like ✅ (U+2705). These can also be created with ✅ (U+2705 U+200D), adding a ZWJ at the end. If you continue to adding ZWJs, the graphical difference doesn't change, but you will have more characters to backspace through (at least on my MacBook).

This logic only matters when the glyphs is at the beginning of the match, otherwise it will be proceeded by a ZWJ.

Longevity

So long as emojis are represented in the format specified above, this regex will be robust against new emojis being created because it uses character classes instead of fixed code point ranges.

Submitted by anonymous