P.COMPILE - compile a pattern for pattern matching.

Usage:

B:
   %b/manif/pmatch
   ret = p.compile(pcode,pattern, [optbits,patlen] );
C:
   #include <pmatch.h>
   int p_compile(pm_code **pcode, const char *pattern,
                 [int opts, int *patlen]);

Examples:

status = p.compile(&pcode, "^^abc$");
if (0 > p.match(line, pcode))
    error("line did not match*n");

Where:

pcode
is the address of a variable where P.COMPILE can store a pointer to the compiled code. The stored pointer can be passed to P.MATCH and P.FREE.
pattern
points to a string containing the pattern to be compiled. See below for more information on patterns.
optbits
is a word of option bits indicating how the compiler pattern will be used. Currently, the only, supported values are PM_C_String and PM_Random_String.
PM_C_String
says the compiled pattern will be applied to C strings. This is the default.
PM_Random_String
says the compiled pattern will be applied to arbitary byte streams. See "expl b lib p.match" for more details.
patlen
is the address of a word where P.COMPILE will store the number of pattern characters that had been processed when compilation stopped. If P.COMPILE detects a syntax error in the pattern, you can use the returned "patlen" value to help you create an error message that indicates where P.COMPILE found the error. You can also use the returned "patlen" value to find the rest of the line after a pattern end character. If you do not want to use this kind of value, omit the "patlen" argument or supply a null pointer (NULL in C, 0 in B).
ret
is the value PM_Ok (0) if P.COMPILE compiled the pattern successfully. If P.COMPILE found a syntax error, it returns the value PM_Syntax.

Description:

P.COMPILE "compiles" a pattern. The result is executable code that can be used to determine whether or not any string contains a substring matching the pattern. Memory for the compiled code is allocated dynamically, and a handle is returned in the location pointed to by "pcode".

Actual pattern matching is done by passing this handle to the P.MATCH function. The memory used by the pattern code is released by calling P.FREE.

If the compilation fails, no memory is allocated, and the value stored at "*patlen" can be used to determine the offset into the pattern string where the error was detected.

Patterns:

The simplest pattern is just a string of normal characters. For example, "abc" matches the sequences 'a', 'b', 'c' anywhere in a string. By default, pattern matching ignores the case of letters, but you can change this by using the P.OPT function to turn off the 'd' special character option.

Patterns can also contain special characters. The special characters that can be used with P.COMPILE are a subset of the special characters recognized by the FRED text editor. Below, we summarize the special pattern characters that P.COMPILE recognizes.

.          -- any single character (except new-line)
^          -- start of line
$          -- end of line
P*         -- zero or more of pattern P
P+         -- one or more of pattern P
P|Q        -- pattern P or Q
(P)        -- same as pattern P
[XYZ...]   -- any character inside brackets
[^XYZ...]  -- any character not inside brackets
{P}T       -- pattern P with tag T
<          -- beginning of word
>          -- end of word
@(N)       -- null string before column N
@(-N)      -- null string after Nth last column

Complete explanations of these are given in "expl fred pattern" and in the FRED Reference Manual. P.COMPILE does NOT recognize the "fence" pattern character '#'. Since P.COMPILE doesn't do anything special with tagged patterns, the @T constructs are not supported. In addition, P.COMPILE does not recognize the '-' shorthand inside "[]".

By default, the special meanings of

$^.*[]{}|()

are turned on, while the special meanings of

<>@+

are turned off.

Putting "\c" in front of a character turns off its special meaning, while putting "\o" in front of a character turns on its special meaning. For example,

"\c["

is a pattern matching any string that contains an opening square bracket. The special meaning of "[" is ignored.

Special characters can also be turned on or off by calling the P.OPT function. You must do this BEFORE you call P.COMPILE to compile the pattern. Once a pattern has been compiled, turning meanings on or off does not affect the compiled pattern's behavior.

You can represent non-printable or "hard to type" characters with "\ddd" where "ddd" are octal digits giving the ASCII representation of the character you wish to match. For example,

"\007"

matches any string that contains the BELL character.

In constructs like "\c" or "\007", the first character is called the "escape character". By default, the escape character is "\"; you can change this by calling the P.OPT function.

Normally, the pattern is terminated by a '*0' byte, but you can define other termination characters using P.OPT. This can be useful if you are parsing a string like "s/ab\c/def/xyz/". After scanning the initial "s/", you can call P.OPT to set '/' as a termination character, then pass the "ab\c/def/xyz/" sequence to P.COMPILE. P.COMPILE stops compiling the pattern after the 'f'. You can then use the value stored in "*patlen" to find the remainder of the string.

See Also:

expl b lib p.free
for how to release the memory used for a compiled pattern.
expl b lib p.match
for how to search for a pattern match
expl b lib p.opt
for setting options to control how the pattern is compiled.
expl fred pattern
for a longer discussion of regular expression patterns.

Copyright © 1996, Thinkage Ltd.