const regex = /EGA\d{11}/gm;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('EGA\\d{11}', 'gm')
const str = `Skip to main content
Skip to local navigation
Skip to EBI global navigation menu
Skip to expanded EBI global navigation menu (includes all sub-sections)
EMBL European Bioinformatics Institute
Services
Research
Training
About us
European Nucleotide Archive
Examples: BN000065, histoneSearchAdvanced Sequence
Home
Search & Browse
Submit & Update
Software
About ENA
Support
ENA > Submit & Update > Data formats > Read domain formats
Read domain formats
This page provides information on the read domain formats in ENA. If you are planning to submit read data into ENA, please refer first to Submitting read data.
Data format
Metadata format
Accession number format
Archive generated fastq file format
Data format
Read data including bases, base qualities and alignments can be submitted in several different formats ... more information.
Metadata format
The metadata model consists of several objects each represented using XML ... more information.
Accession number format
Each metadata object is assigned a unique accession number by the archive. The accession numbers can be used to retrieve data and metadata using the EB-Eye search available at the top of all EBI web pages or using the free text search available on the ENA home page. The metadata is then retrieved and displayed through the ENA Browser as in the examples in the above table.
Accession numbers assocaited with read data assigned by EBI start with 'ER' and accession numbers assigned by NCBI and DDBJ start with 'SR' and 'DR', respectively. The third letter of the accession number indicates the type of the metadata object. EGA accession numbers start with 'EGA' with the fourth letter indicating the type of the metadata object.
The accession numbers have a fixed number of digits after the letters: six for ENA and eleven for EGA.
Metadata object Accession prefix Number of digits Example
Submission ERA, SRA, DRA 6 ERA000092
Sample ERS, SRS, DRS 6 ERS000081
Study ERP, SRP, DRP 6 ERP000016
Experiment ERX, SRX, DRX 6 ERX000398
Run ERR, SRR, DRR 6 ERR003990
Analysis ERZ, SRZ, DRZ 6 ERZ000001
EGA Submission EGA 11 EGA00001000001
EGA Sample EGAN 11 EGAN00001000001
EGA Study EGAS 11 EGAS00001000001
EGA Experiment EGAX 11 EGAX00001000001
EGA Run EGAR 11 EGAR00001000001
EGA Analysis EGAZ 11 EGAZ00001000001
EGA DAC EGAC 11 EGAC00001000001
EGA Policy EGAP 11 EGAP00001000001
EGA Data Set EGAD 11 EGAD00001000001
Archive generated fastq file format
Once made public, data submitted to ENA are available for download using ftp and Aspara. Detailed data download instructions are available here. Currently, both submitted data files and archive generated fastq files are made available for download. The naming and format of the generated fastq files are described below.
In general, one fastq file is created for each application read in a run. Please refer to the table below for full details:
Number of application reads Fastq Files Description
1
<run_accession>.fastq.gz
or
<run_accession>_1.fastq.gz
For experiments with single application reads only all reads will be made available in one fastq file.
2 <run_accession>_1.fastq.gz
<run_accession>_2.fastq.gz
<run_accession>.fastq.gz For paired experiments with two application reads reads will be made available in 1-3 fastq files. If a paired experiment is submitted with both application reads then the first reads will be in <run accession>_1.fastq.gz file, the second reads will be in <run accession>_2.fastq.gz, and any unpaired reads will be in <run accession>.fastq.gz file. In case a paired experiment is submitted containing only unpaired reads then only a single file will be created: <run accession>.fastq.gz.
> 2 <run_accession>_N.fastq.gz For experiments with more than two application reads (e.g. Complete Genomics or strobed PacBio) one fastq file is created for each application read, however, no empty fastq files are created.
The fastq file format is:
@<run accession>.<spot index>[ <spot name>\\[/<read index>\\]]
<bases>
+
<phred qualities, ASCII encoded starting with '!' (33)>
Field Description
<run accession> The Run accession. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot index> A positive integer assigned to the spots in the order in which they appear in the run. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot name> The spot name as it was provided by the submitter. In cases where the read name is missing or was removed by the archive this field is not present.
<read index> A positive integer assigned to the application reads in the order in which they appear in the spot: /1 for first application read and /2 for the second application read. In cases where the read name is missing or was removed by the archive this field is not present.
Examples
Single layout:
@ERR000017.1 IL6_554:7:1:249:322
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
??????????????????????????????>>>>>>
Paired (first read):
@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Paired (second read):
@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Single layout without read names:
@ERR000017.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
??????????????????????????????>>>>>>
Paired without read names (first read):
@ERR005143.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Paired without read names (second read):
@ERR005143.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
SOLiD color:
The first base is included before the SOLiD colors.
@ERR000451.1 VAB_S0103_20080915_542_14_17_70_F3
T33023230203102103223330020300233001
+
T<.6353&:#\$1%&(--27*%&%,
Submit & Update
Data formats
Accession numbers
Sequences
Reads
Read file formats
XML 1.5
XML 1.4
XML 1.3
XML 1.2
XML 1.1
XML 1.0
Library strategy
Trace
Projects
Taxonomy
Uploading data files
Reads
Sequences
Genome assemblies
Taxonomy
Sample checklists
Environmental
Epigenomic
Species BARCODE
Metadata model
Register submission account
Programmatic XML submissions
Programmatic tabulated submissions
Popular
Submit and update
Sequence submissions
Genome assembly submissions
Submitting environmental sequences
Citing ENA data
Rest URLs for data retrieval
Rest URLs to search ENA
Latest ENA news
09 Dec 2015: ENA Release 126
Release 126 of ENA's assembled/annotated sequences now available
02 Nov 2015: Change to Globus endpoint for public ENA data
The Globus endpoint for public ENA data is changing from ebi#ena to ebi#public ('ena' subfolder).
23 Sep 2015: ENA Release 125
Release 125 of ENA's assembled/annotated sequences now available
EMBL European Bioinformatics Institute
News
Brochures
Contact us
Intranet
Services
By topic
By name (A-Z)
Help & Support
Research
Overview
Publications
Research groups
Postdocs & PhDs
Training
Overview
Train at EBI
Train outside EBI
Train online
Contact organisers
Industry
Overview
Members Area
Workshops
SME Forum
Contact Industry programme
About us
Overview
Leadership
Funding
Background
Collaboration
Jobs
People & groups
News
Events
Visit us
Contact us
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK +44 (0)1223 49 44 44
Copyright © EMBL-EBI 2015 | EBI is an outstation of the European Molecular Biology Laboratory | Terms of use
OK This website uses cookies. By continuing to browse this site, you are agreeing to the use of our site cookies. To find out more, see our Terms of Use.`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for JavaScript, please visit: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions