PATTERNS - the FRED pattern matcher.

Summary of Patterns:

.          -- any single character (except new-line)
^          -- start of line
$          -- end of line
P*         -- zero or more of pattern P
P+         -- one or more of pattern P
P|Q        -- pattern P or Q
(P)        -- same as pattern P
[XYZ...]   -- any character inside brackets
[^XYZ...]  -- any character not inside brackets
{P}T       -- pattern P with tag T
<          -- beginning of word
>          -- end of word
@(N)       -- null string before column N
@(-N)      -- null string after Nth last column
@T         -- matches whatever matched last {P}
              with tag T
\E(name)   -- defined pattern
#          -- fence operation

Description:

Patterns are constructs that represent a set of character strings. A pattern matches a string if the string is in the set represented by the pattern. As a simple example,

/hello/

is a pattern that matches the string hello. Normally, FRED ignores whether characters are in upper or lower case, so the pattern also matches strings like HELLO, Hello, and hELLo. You can tell FRED to pay attention to the case of letters with the O-SD option.

Patterns need not be as simple as the one above. The pattern matching facility of FRED (called simply the Pattern Matcher) can handle very complicated patterns that match very diverse sets of strings. Because there are so many possible ways to specify a pattern, it is best to define the patterns accepted by FRED in a fairly rigorous manner. This is done below.

Note that we will use the S command to illustrate some of the patterns we describe.

s/pattern/string/

is a command puts the string in place of anything that matches pattern. Thus,

s/A/B/

changes every A in a line into a B.

Quasi-Patterns

If N is a positive integer,

N/pattern/

defines a quasi-pattern which matches the Nth occurrence of pattern in any line. Thus you can say such things as

s2/a/x/ zu3/y/ zl1/v/

and so on. In the same way,

-N/pattern/

is a quasi-pattern that matches the Nth occurrence of pattern from the end of any line.

We call these quasi-patterns instead of genuine patterns because these constructions are not valid in line addresses or in G and T commands (where they would be ambiguous). Otherwise, quasi-patterns can be used anywhere that a normal pattern can.

The constructions described above are the only patterns accepted by FRED. No other patterns are valid. No pattern will match strings that spread across more than one line.

Activating Patterns

Some of the constructions defined above can be activated and deactivated using the O+S or O-S commands. The default options are

O+S^$.*[\E&D-
O-S{(|+#@

Placing \C in front of one of the special pattern characters tells FRED to take the character literally, ignoring any special meaning. Placing \O in front of a pattern character tells FRED to use the character's special meaning, even if that meaning has been disabled by option commands. For example, if O+S* is in effect

s/A\C*/X/

will change the string A* to the letter X. This is far different from

s/a*/x/

In the same way, even if O-S| is in effect

s/A\O|B/C/

will change A or B into C.

Pattern Delimiters

Throughout this discussion of patterns we have used the slash / to delimit our patterns. While this is the general practice, FRED will accept any other non-alphabetic non-numeric character as a delimiter for patterns in S commands, T commands, and so on. However, patterns acting as line addresses can only be delimited by / and ?.

Pattern Examples

/ABCD/
matches ABCD anywhere in the line.
/A(B|C)+D/
matches a string beginning with A, ending with D and having a number of B's and/or C's in between.
/^BEGIN.*END$/
matches any line beginning with BEGIN and ending with END.
/A[1234567890]/
matches A followed by a digit.

The Pattern Matcher's Search Process

The Pattern Matcher searches for a string to match a given pattern by moving across a line column by column. In other words, it usually checks to see if there is a suitable string beginning in column 1, then beginning in column 2, and so forth. The exception is when it is searching for a quasi-pattern with a negative qualifier as in -1/A/. In this case, FRED begins looking for the pattern column by column from the end of the line instead of the beginning.

In general, FRED looks for the longest suitable string beginning in a given column. One exception to this rule occurs sometimes in patterns like

/ABC|AB|A/

which use the or-bar |. For any given column, FRED will first search for a string matching ABC, and will only search for AB and A if it does not find ABC. The pattern

/AB|A|ABC/

will look for AB beginning in a given column before it looks for A and ABC. Since FRED will always find A before it finds the full string ABC, the above will match AB or A, but never ABC.

A few more examples are given below.

/A|AB|ABC/
In a string ABCD, this matches A. Because FRED finds the match for A before AB or ABC, the AB and ABC are essentially useless in the pattern.
/ABC|AB|A/
In a string ABCD, this matches ABC.
/ABC|AB|A/
In a string AABCD, this will first match the first A and then the ABC.

In this way, FRED allows you to dictate which strings you prefer to match first.

A side effect of this principle lets you tell FRED to find the shortest matching string instead of the longest. A pattern like

/A.*B/

matches the longest string beginning in A and ending in B. If, however, you try

/A(|.)*B/

you tell FRED that you would rather match the null string (before the or-bar) than an actual character. Thus FRED will be satisfied with the shortest match rather than the longest (and you will match the shortest string beginning with an A and a B). If you have a line

ABAB

the command

s/A.*B/X/

will leave a line that only contains X, while

s/A(|.)*B/X/

will leave a line that contains XX (since both AB pairs turn into X).

Another consequence of FRED's approach to pattern matching is the way in which it handles a command like

s-1/A.*/X/

FRED moves column by column backwards across the line and matches /A.*/ with the first A it comes to. Thus the given command will change only the last A in the line and everything after it.

As a last example, consider the pattern

/{.*}a{.*}b/

In this case, the pattern tagged with A matches the entire contents of the line, and the pattern tagged with B matches the null string.

Copyright © 1998, Thinkage Ltd.