Regular Expressions 101

Community Patterns

How to get AC, Tax_id, optionally VAR_SEQ, and Sequence?

0

Regular Expression
Python

r"
(?ms)ID\s+.*?^AC\s+(\w+);.*?^OX\s+NCBI_TaxID=(\d+).*?(?#how to optionally capture group ^FT\s+VAR_SEQ.*?\/FTId=\w+\. ).*?^\s{5}(.*?)//
"
gx

Description

  1. (?ms)ID\s+.?^AC\s+(\w+);.?^OX\s+NCBI_TaxID=(\d+).?(?#how to optionally capture group ^FT\s+VAR_SEQ.?/FTId=\w+. ).?^\s{5}(.?)//
  2. (?ms)(FT\s+VAR_SEQ.*?/FTId=\w+.)

(1) can give AC, Tax_id and Sequence and these 3 fields are always present in an Entry, but VAR_SEQ field is optional. (2) gives me the VAR_SEQ lines, so how to combine these 2 regexes in one?

Submitted by anonymous - 7 years ago