Regular Expressions 101

Save & Share

Regex Version: ver. 1
Fork Regex
ctrl+s
Go to community entry

Flavor

PCRE2 (PHP >=7.3)
PCRE (PHP <7.3)
ECMAScript (JavaScript)
Python
Golang
Java 8
.NET 7.0 (C#)
Rust
Regex Flavor Guide

Function

Match
Substitution
List
Unit Tests

Tools

Code Generator
Regex Debugger
Export Matches
Benchmark Regex

Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        final String regex = "best practices ";
        final String string = "BEST PRACTICES – CODING STANDARDS\n\n\n"
	 + "AUTOSYS:\n\n"
	 + "1.	NRETRY OPTION Feature - Extract jobs querying against databases should use Autosys N-retry option 3-4 times before failure.\n"
	 + "2.	Health Check job/Dummy Job/Polling Job mechanism should be designed to have better operations controls (Holding, Releasing Jobs)\n"
	 + "3.	The server names should not be hardcoded in Autosys JILs. Wide IP/ Alias/Virtual Machine names should be used.\n"
	 + "4.	Multifilewatcher concept should be pursued instead of file watcher to reduce bulk failures during planned outage.\n"
	 + "5.	Separate PROD and COB Schedules - Maintain separate Autosys jobs for PROD and COB loads, wherever DUAL LOAD mechanism exists. This will help us to HOLD/Release all the Jobs, which are connecting to a particular DB/Server.\n"
	 + "6.	Restrict # of multiple unloads from source DB using Autosys feature of controlling # of parallel jobs. Have a health check job where we are having a scenario of multiple unloads from a Database.\n"
	 + "7.	Resource concept should be used in Autosys to achieve load balancing and avoid performance bottlenecks due to high number of parallel executions.\n\n"
	 + "8.	Job Naming Convention- Job Name should specify as much info as possible. Eg,1. A short code for the project/application name, 2. Type of Job (Extract, Transformation, Load, File Watcher etc.), 3. Infra getting used (DB Name alias) \n\n"
	 + "9.	The Autosys JILs should have a comment describing the brief function of the job. \n"
	 + "  Eg: description: \"This job is to unload the data from XYZ EDW table and to load ABC folder/system\"\n"
	 + "           Description: \"This job is validates the source file and generates load ready file\"\n\n"
	 + "10.	 There should not be any jobs without a box unless it is a standalone job. \n\n"
	 + "11.	Schedules should have Time dependency as well as trigger dependency (If Source is file or Source DB have trigger mechanism to mark the completion of loading of data)\n"
	 + "12.	Lookback feature should be added to avoid accidental execution of schedule by referring to the success state of previous schedule.\n"
	 + "13.	Jobs to be put on schedule only when the project is planned to be LIVE.\n"
	 + "14.	Use bulk and Intermediate mode concept based on data volume\n"
	 + "15.	Applications should not be scheduled during Green Zone \n"
	 + "16.	Projects that are obsolete in prod needs to have jobs decommissioned instead of being ON-ICE\n"
	 + "17.	 Deleting Jobs - While Jobs are being decommissioned, proper dependency analysis should be done. The Box should be deleted if all the Jobs in that box is to be decommissioned.\n\n\n\n\n\n\n\n"
	 + "TRIGGER FILES:\n\n"
	 + "1.	All the triggers and source files should be cleaned up/archived after processing.  \n"
	 + "2.	Files being sent to BI should have data files and trigger file to ensure downstream receives the data file and processes it. Eg: ODS sends files to BI, each data file is accompanied with trigger file.\n\n\n\n"
	 + "CLEANUP & UTILITY SCRIPTS:\n\n"
	 + "File Retention Policies/Cleanup/Space Monitoring – \n"
	 + "1.	Define retention policy for every file – Input, output, intermediate, reject, error files etc.\n"
	 + "2.	Ensure the daily cleanup process is setup for every feed in production and new feeds being promoted to production based on retention policy.\n"
	 + "3.	Ensure that all new projects have space allocation done to all the required mounts / directories based on retention policy before implementation of the project to production and ensure space-monitoring scripts are available.\n"
	 + "4.	Log file locations and not coding to have any files in /tmp or any other non-project locations.\n"
	 + "5.	Auto cleanup of files including inbound, outbound, error, log, Autosys logs\n"
	 + "6.	Automated reconciliation controls in place to catch issues in data so that to avoid any manual work around and fix issues in a timely manner.\n"
	 + "7.	Reject handling frameworks- Rejects should be captured and sent to downstream/ users\n"
	 + "8.	\n\n"
	 + "FILE HANDLING:\n\n"
	 + "1.	 Header / trailers should be encouraged to assist reconciliation process and help in audit/controls .Ex: Provide record /data count in file.\n"
	 + "2.	Source file having transactional Data with Dates : A sanity check should be done to ensure that duplicate data is not being processed\n"
	 + "3.	Put controls to ensure the Files containing PII information is contained in a special directory which is not accessible and secure (Not use persistent access, but use a specialized FID)\n\n"
	 + "AUDIT & CONTROLS:\n"
	 + "1.	NDM Jobs – Transmission controls to ensure all data has been transferred successfully\n"
	 + "2.	FID - Each Application ( Example: Global Fraud, REW, Genesis)  within BI should have their own Unique FID’s for batch processing\n"
	 + "3.	Alerts – Only actionable alerts should be set up for Production Management. \n"
	 + "4.	Access permissions - As of now all Codes, parm, config, env are in FID, which is high risk. Code and environment parameters should be owned by an FID which would be accessible by only SA/Admin\n\n\n\n\n\n\n"
	 + "EAP –\n\n"
	 + "1.	Edge node - Identify Key Players for EAP in terms of which team dictates App to use Edge Nodes, team responsible to allocate maximum space for a give App etc. Currently there are 3 Edge Nodes and all new projects are using same Edge Node - bdswr402x20d1.nam.nsroot.net. \n"
	 + "2.	Include Kerberos Authentication script in jobs to ensure continuity of job run else it will time out/terminate after 8hrs. Kerberos Authentication is at FID level so if multiple teams are using same FID, one job in any of these teams should take care of the authentication\n"
	 + "3.	CATE framework needs to be flexible to provide custom paths for applications to write their log/error files etc.  For debugging purpose. Currently all Talend logs are dumped into this location /var/log/talend/.\n"
	 + "4.	All the Autosys logs should be captured in application specific edge node location. Admin should have space limit set to how much each FID can write on /home directory.\n"
	 + "a.	As per current design , Autosys logs are being captured in /home \n\n"
	 + "              std_out_file: \"/home/gcgeeapnatldp/autosyslogs/$AUTO_JOB_NAME.$AUTORUN.out\"\n"
	 + "std_err_file: \"/home/gcgeeapnatldp/autosyslogs/$AUTO_JOB_NAME.$AUTORUN.err\"\n\n\n";
        
        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
        final Matcher matcher = pattern.matcher(string);
        
        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Java, please visit: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Regular Expressions 101

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular ExpressionNo Match

Test String

Regular Expression
No Match