Regular Expressions 101

Save & Share

Regex Version: ver. 1
Update Regex
ctrl+⇧+s
Save new Regex
ctrl+s
Add to Community Library
Flavor

PCRE2 (PHP >=7.3)
PCRE (PHP <7.3)
ECMAScript (JavaScript)
Python
Golang
Java 8
.NET 7.0 (C#)
Rust
Regex Flavor Guide
Function

Match
Substitution
List
Unit Tests
Tools

Code Generator
Regex Debugger
Export Matches
Benchmark Regex
Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        final String regex = "EGA\\d{11}";
        final String string = "Skip to main content\n"
	 + "Skip to local navigation\n"
	 + "Skip to EBI global navigation menu\n"
	 + "Skip to expanded EBI global navigation menu (includes all sub-sections)\n"
	 + "EMBL European Bioinformatics Institute\n"
	 + "Services\n"
	 + "Research\n"
	 + "Training\n"
	 + "About us\n"
	 + " European Nucleotide Archive\n"
	 + " \n"
	 + "Examples: BN000065, histoneSearchAdvanced Sequence\n"
	 + "Home\n"
	 + "Search & Browse\n"
	 + "Submit & Update\n"
	 + "Software\n"
	 + "About ENA\n"
	 + "Support\n"
	 + "ENA > Submit & Update > Data formats > Read domain formats\n\n"
	 + "Read domain formats\n"
	 + "This page provides information on the read domain formats in ENA. If you are planning to submit read data into ENA, please refer first to Submitting read data.\n\n"
	 + "Data format\n"
	 + "Metadata format\n"
	 + "Accession number format\n"
	 + "Archive generated fastq file format\n"
	 + "Data format\n"
	 + "Read data including bases, base qualities and alignments can be submitted in several different formats ... more information.\n\n"
	 + "Metadata format\n"
	 + "The metadata model consists of several objects each represented using XML ... more information.\n\n"
	 + "Accession number format\n"
	 + "Each metadata object is assigned a unique accession number by the archive. The accession numbers can be used to retrieve data and metadata using the EB-Eye search available at the top of all EBI web pages or using the free text search available on the ENA home page. The metadata is then retrieved and displayed through the ENA Browser as in the examples in the above table.\n\n"
	 + "Accession numbers assocaited with read data assigned by EBI start with 'ER' and accession numbers assigned by NCBI and DDBJ start with 'SR' and 'DR', respectively. The third letter of the accession number indicates the type of the metadata object. EGA accession numbers start with 'EGA' with the fourth letter indicating the type of the metadata object.\n\n"
	 + "The accession numbers have a fixed number of digits after the letters: six for ENA and eleven for EGA.\n\n"
	 + "Metadata object	Accession prefix	Number of digits	Example\n"
	 + "Submission	ERA, SRA, DRA	6	ERA000092\n"
	 + "Sample	ERS, SRS, DRS	6	ERS000081\n"
	 + "Study	ERP, SRP, DRP	6	ERP000016\n"
	 + "Experiment	ERX, SRX, DRX	6	ERX000398\n"
	 + "Run	ERR, SRR, DRR	6	ERR003990\n"
	 + "Analysis	ERZ, SRZ, DRZ	6	ERZ000001\n"
	 + "EGA Submission	EGA	11	EGA00001000001\n"
	 + "EGA Sample	EGAN	11	EGAN00001000001\n"
	 + "EGA Study	EGAS	11	EGAS00001000001\n"
	 + "EGA Experiment	EGAX	11	EGAX00001000001\n"
	 + "EGA Run	EGAR	11	EGAR00001000001\n"
	 + "EGA Analysis	EGAZ	11	EGAZ00001000001\n"
	 + "EGA DAC	EGAC	11	EGAC00001000001\n"
	 + "EGA Policy	EGAP	11	EGAP00001000001\n"
	 + "EGA Data Set	EGAD	11	EGAD00001000001\n"
	 + "Archive generated fastq file format\n"
	 + "Once made public, data submitted to ENA are available for download using ftp and Aspara. Detailed data download instructions are available here. Currently, both submitted data files and archive generated fastq files are made available for download. The naming and format of the generated fastq files are described below.\n\n"
	 + "In general, one fastq file is created for each application read in a run. Please refer to the table below for full details:\n\n"
	 + "Number of application reads	Fastq Files	Description\n"
	 + "1	\n"
	 + "<run_accession>.fastq.gz\n"
	 + "or\n"
	 + "<run_accession>_1.fastq.gz\n\n"
	 + "For experiments with single application reads only all reads will be made available in one fastq file.\n"
	 + "2	<run_accession>_1.fastq.gz\n"
	 + "<run_accession>_2.fastq.gz\n"
	 + "<run_accession>.fastq.gz	For paired experiments with two application reads reads will be made available in 1-3 fastq files. If a paired experiment is submitted with both application reads then the first reads will be in <run accession>_1.fastq.gz file, the second reads will be in <run accession>_2.fastq.gz, and any unpaired reads will be in <run accession>.fastq.gz file. In case a paired experiment is submitted containing only unpaired reads then only a single file will be created: <run accession>.fastq.gz.\n"
	 + "> 2	<run_accession>_N.fastq.gz	For experiments with more than two application reads (e.g. Complete Genomics or strobed PacBio) one fastq file is created for each application read, however, no empty fastq files are created.\n"
	 + "The fastq file format is:\n\n"
	 + "@<run accession>.<spot index>[ <spot name>\\[/<read index>\\]]\n"
	 + "<bases>\n"
	 + "+\n"
	 + "<phred qualities, ASCII encoded starting with '!' (33)>\n"
	 + "Field	Description\n"
	 + "<run accession>	The Run accession. A spot is identified uniquely by the combination of the Run accession and the spot index.\n"
	 + "<spot index>	A positive integer assigned to the spots in the order in which they appear in the run. A spot is identified uniquely by the combination of the Run accession and the spot index.\n"
	 + "<spot name>	The spot name as it was provided by the submitter. In cases where the read name is missing or was removed by the archive this field is not present.\n"
	 + "<read index>	A positive integer assigned to the application reads in the order in which they appear in the spot: /1 for first application read and /2 for the second application read. In cases where the read name is missing or was removed by the archive this field is not present.\n"
	 + "Examples\n"
	 + "Single layout:\n\n"
	 + "@ERR000017.1 IL6_554:7:1:249:322\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "??????????????????????????????>>>>>>\n"
	 + " \n\n"
	 + "Paired (first read):\n\n"
	 + "@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/1\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh\n"
	 + " \n\n"
	 + "Paired (second read):\n\n"
	 + "@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/2\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh\n"
	 + " \n\n"
	 + "Single layout without read names:\n\n"
	 + "@ERR000017.1\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "??????????????????????????????>>>>>>\n"
	 + " \n\n"
	 + "Paired without read names (first read):\n\n"
	 + "@ERR005143.1\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh\n"
	 + " \n\n"
	 + "Paired without read names (second read):\n\n"
	 + "@ERR005143.1\n"
	 + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
	 + "+\n"
	 + "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh\n"
	 + " \n\n"
	 + "SOLiD color:\n\n"
	 + "The first base is included before the SOLiD colors.\n\n"
	 + "@ERR000451.1 VAB_S0103_20080915_542_14_17_70_F3\n"
	 + "T33023230203102103223330020300233001\n"
	 + "+\n"
	 + "T<.6353&:#$1%&(--27*%&%,\n"
	 + "Submit & Update\n"
	 + "Data formats\n"
	 + "Accession numbers\n"
	 + "Sequences\n"
	 + "Reads\n"
	 + "Read file formats\n"
	 + "XML 1.5\n"
	 + "XML 1.4\n"
	 + "XML 1.3\n"
	 + "XML 1.2\n"
	 + "XML 1.1\n"
	 + "XML 1.0\n"
	 + "Library strategy\n"
	 + "Trace\n"
	 + "Projects\n"
	 + "Taxonomy\n"
	 + "Uploading data files\n"
	 + "Reads\n"
	 + "Sequences\n"
	 + "Genome assemblies\n"
	 + "Taxonomy\n"
	 + "Sample checklists\n"
	 + "Environmental\n"
	 + "Epigenomic\n"
	 + "Species BARCODE\n"
	 + "Metadata model\n"
	 + "Register submission account\n"
	 + "Programmatic XML submissions\n"
	 + "Programmatic tabulated submissions\n"
	 + "Popular\n"
	 + "Submit and update\n"
	 + "Sequence submissions\n"
	 + "Genome assembly submissions\n"
	 + "Submitting environmental sequences\n"
	 + "Citing ENA data\n"
	 + "Rest URLs for data retrieval\n"
	 + "Rest URLs to search ENA\n"
	 + "Latest ENA news\n"
	 + "09 Dec 2015: ENA Release 126\n"
	 + "Release 126 of ENA's assembled/annotated sequences now available\n\n"
	 + "02 Nov 2015: Change to Globus endpoint for public ENA data \n"
	 + "The Globus endpoint for public ENA data is changing from ebi#ena to ebi#public ('ena' subfolder).\n\n"
	 + "23 Sep 2015: ENA Release 125\n"
	 + "Release 125 of ENA's assembled/annotated sequences now available\n\n"
	 + " EMBL European Bioinformatics Institute\n"
	 + "News\n"
	 + "Brochures\n"
	 + "Contact us\n"
	 + "Intranet\n"
	 + "Services\n"
	 + "By topic\n"
	 + "By name (A-Z)\n"
	 + "Help & Support\n"
	 + "Research\n"
	 + "Overview\n"
	 + "Publications\n"
	 + "Research groups\n"
	 + "Postdocs & PhDs\n"
	 + "Training\n"
	 + "Overview\n"
	 + "Train at EBI\n"
	 + "Train outside EBI\n"
	 + "Train online\n"
	 + "Contact organisers\n"
	 + "Industry\n"
	 + "Overview\n"
	 + "Members Area\n"
	 + "Workshops\n"
	 + "SME Forum\n"
	 + "Contact Industry programme\n"
	 + "About us\n"
	 + "Overview\n"
	 + "Leadership\n"
	 + "Funding\n"
	 + "Background\n"
	 + "Collaboration\n"
	 + "Jobs\n"
	 + "People & groups\n"
	 + "News\n"
	 + "Events\n"
	 + "Visit us\n"
	 + "Contact us\n"
	 + "EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK     +44 (0)1223 49 44 44\n"
	 + "Copyright © EMBL-EBI 2015 | EBI is an outstation of the European Molecular Biology Laboratory | Terms of use\n"
	 + "OK This website uses cookies. By continuing to browse this site, you are agreeing to the use of our site cookies. To find out more, see our Terms of Use.";
        
        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
        final Matcher matcher = pattern.matcher(string);
        
        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Java, please visit: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Regular Expressions 101

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular ExpressionNo Match

Test String

Regular Expression
No Match