Regular Expressions 101

Community Patterns

Match what we don't want, to replace with empty

1

Regular Expression
PCRE (PHP <7.3)

/
^(?!settlement-id($|\t.*$)).*
/
gm

Description

This was necessary for cases such as the Amazon Settlements reports To import these, we concatenate many file into one. The header in this concatenated file is reapeated as many times as files were concatenated. We have seen that is there are no sales with a promotion for a determined period, the file for that period does not contain the promotion-id column or header • with promotion-id "settlement-id settlement-start-date settlement-end-date deposit-date total-amount currency transaction-type order-id merchant-order-id adjustment-id shipment-id marketplace-name amount-type amount-description amount fulfillment-id posted-date posted-date-time order-item-code merchant-order-item-id merchant-adjustment-item-id sku quantity-purchased promotion-id" • without promotion-id "settlement-id settlement-start-date settlement-end-date deposit-date total-amount currency transaction-type order-id merchant-order-id adjustment-id shipment-id marketplace-name amount-type amount-description amount fulfillment-id posted-date posted-date-time order-item-code merchant-order-item-id merchant-adjustment-item-id sku quantity-purchased" We still want to import all data from the file without the promotion-id, since no heder column name has changed, neither the order of the columns have changed. This is just an omition, since there is no data for this column for the period As long as these 2 facts are true • No header name was changed • No header name is found in a different position in the header then, the header is considered valid and the import should proceed without a header error. This is in spite of the header only partially matching the stored header This list is also used to delete the headers and partially matching instances of the header from the files to be imported METHODOLOGY To search the file for these instances of the headers, we search for any line that starts with the name of the first column, "settlement-id" in this case This method assumes the following: • No line in the column "settlement-id", contains the word "settlement-id" If this is true, then any line that begins with "settlement-id" is a header To search for this we had 2 options

  1. epRegExReplace( Expression; Replacement; Target {; "Options" } ). Basically search with regular expressions. Match the lines that do NOT begin with "settlement-id" and replace those with empty. Then remove the empty lines
  2. FilterList ( ListA ; Attribute ; ListB ; CaseSensitive ) As shown in the test below, epRegExReplace was almost 500 times faster
Submitted by anonymous - 4 years ago