TIME - parsing dates and times.

The .TOSEC function and various other functions in the UW Tools library convert date/time strings like "June 21, 1993" into formats that are more convenient for machine manipulation. To do this, they make use of a "date/time parsing table" which specifies rules for converting date/time strings into other representations.

The standard distribution of UW Tools comes with a date/time parsing table for dates in English. This table is declared as the external variable ".T_ENG". The table should be adequate to the needs of sites where English is the only language used.

Other sites may wish to create date/time parsing tables for other languages or may wish to extend the English-language parsing tables to support additional notations. Therefore, the rest of this explain file describes the format of a date/time parsing table. We emphasize that this material will only be of interest to programmers who wish to create their own such table or to modify the existing table.

Parsing Table Format:

A date/time parsing table is represented as a vector of structures; in the C programming language, these structures are declared to have the type "_T_parse", a type defined in <t_ctrl.h>. Once you have constructed a date/time parsing table, you can pass it as an argument to the following library functions:

.TLANG
uses the table to set defaults for all subsequent date/time parsing.
.TOSEC
accepts a date/time parsing table as an argument for parsing a particular date/time string. (If you just want to use the default table, you don't have to pass this argument.)

The "_T_parse" structure type has five fields:

char *str;
a string that might be found in a date or time. This may be a simple component of the date (like "january" or "wednesday"). It may also be a modifier telling how to interpret another component of the date. For example, in the string
10 days ago 

the "10" is a simple component of the date, "days" is a modifier giving the units of the integer 10, and "ago" is another modifier indicating that "10 days" is a negative offset from the current point in time.

Letters in "str" should be given in lowercase. The pointer should have the format used by the C programming language, with the upper half of the word containing a word address and the lower half containing a byte offset.

long type;
is a group of flags indicating how to interpret the string. For example, if "str" is "january", the "type" indicates that this is interpreted as a month. Types are specified using manifests defined in
<t_ctrl.h>       /* C programs */
b/manif/t_ctrl   /* B programs */

These manifests may be OR'ed together as appropriate.

int value;
is the value associated with the string. For example, if "str" is "january", the "value" is 1, indicating the first month.
int minchar;
is the minimum number of characters that must match "str" in order to be considered a match for "str". For example, the "minchar" for "january" is 3, indicating that users may abbreviate the month down to "jan".
int maxchar;
is the maximum number of characters allowed in a match. This should be the length of "str".

Here are some sample "_T_parse" structures declarations:

{ "january",  _T_MONTH, 1, 3, 7 }
{ "february", _T_MONTH, 2, 3, 8 }
     /* and so on */
{ "today",     _T_TODAY_PLUS,  0, 5, 5 }
{ "tomorrow",  _T_TODAY_PLUS,  1, 8, 8 }
{ "yesterday", _T_TODAY_PLUS, -1, 9, 9 }
{ "a.m.",      _T_AM_PM,       0, 3, 4 }
{ "p.m.",      _T_AM_PM,      12, 3, 4 }

As you might guess, _T_TODAY_PLUS indicates dates relative to today. The flag _T_DAY_PLUS indicates dates relative to Sunday, as in

{ "sunday",  _T_DAY_PLUS, 0, 3, 6 }
{ "monday",  _T_DAY_PLUS, 1, 3, 6 }
{ "tuesday", _T_DAY_PLUS, 2, 3, 7 }

If you wish, you may specify time zones in the date/time parsing table. However, this usually isn't necessary; if .TOSEC finds a string that it doesn't recognize, it calls .TZQRY to see if the string is a valid time zone name as specified in the time zone definition file. Thus, .TOSEC automatically recognizes all defined time zone strings. If you want to define additional time zone strings, the "value" is the offset of the time zone from Greenwich Mean Time, expressed in minutes, as in

{ "est", _T_TIMEZONE,               60*5, 3, 3 }
{ "edt", _T_TIMEZONE | _T_DST, (60*5)-60, 3, 3 }

Notice that daylight savings time zones are marked with the flag _T_DST.

"_T_parse" structures for modifiers all have the flag _T_CONTROL set in their "type" field, as in

{ "years", _T_CONTROL | _T_OFFSET | _T_YEAR, 1, 4, 5 }
{ "weeks", _T_CONTROL | _T_OFFSET | _T_DAY,  7, 4, 5 }

These entries say that a string like "10 years" or "52 weeks" represents an offset from another time. Notice that the entry for "weeks" says that the offset is in terms of days and gives a "value" of 7 to indicate that "N weeks" is equivalent to 7*N days.

The _T_TIME flag relates to the default time of day, specified as an argument when you call .TOSEC. The default time of day is used when a date/time string contains a date but not a time of day on that date. _T_TIME tells .TOSEC to ignore the default time of day argument, and to use the current time of day instead. For example, the definition of "hours" is

{ "hours", _T_CONTROL|_T_OFFSET_|_T_HOUR|_T_TIME,
       1, 4, 5 }

which says that a string like "10 hours" should be interpreted as an hour offset from the current time of day, even if the call to .TOSEC specifies a different default time of day.

By default, modifiers are assumed to follow the components they modify, separated from the components by white space. If a modifier can follow a component without intervening white space, it should be flagged with _T_NO_SPACE, as in

{ "h", _T_CONTROL|_T_NO_SPACE|_T_HOUR, ...

This allows strings like "10h" to stand for 10 hours.

If a modifier can be followed by a component as well as preceded by one, the "value" field indicates what type of value is expected to follow the operator. For example, the full structure for the "h" operator is

{ "h", _T_CONTROL|_T_NO_SPACE|_T_HOUR, 
       _T_MIN|_T_OPTIONAL|_T_NO_SPACE, 1, 1 }

This indicates that if the "h" operator is followed by another value, that value should be interpreted as time in minutes; however, the _T_OPTIONAL shows that the second value is optional. With this definition, "10h30" would be interpreted as 10 hours, 30 minutes. Similarly, the default definitions give

{ "m", _T_CONTROL|_T_NO_SPACE|_T_MIN,
       _T_SEC|_T_OPTIONAL|_T_NO_SPACE, 1, 1 }

which allows for time measures like "10h30m25".

The _T_PAIR flag is used for operators that might be paired with another operator. For example, in times like "10:30:25", the two ":" operators are paired. The corresponding structure should define the first operator in the "str" field and give the expected second operator in the "value" field, as in

{ ":", _T_CONTROL|_T_PAIR|_T_SEPARATOR|_T_HOUR|_T_MIN|_T_SEC,
       _T_HOUR|_T_MIN|_T_SEC|':', 1, 1 }

The "value" argument consists of the ':' character (to show the expected second operator) OR'ed with _T_HOUR, _T_MIN, _T_SEC to indicate a time value.

The last entry in a date/time parsing table should be a _T_parse structure with a NULL pointer in its "str" field. The "type" field should be a flag indicating the default for constructions consisting of three numbers, as in

12 25 93

The default can be any one of the following flags:

_T_P_YYMMDD
_T_P_MMDDYY
_T_P_DDMMYY

(Note that the default for parsing strings of the form "NN/NN/NN" is specified in the definition of the "/" operator.)

We strongly recommend that people intended to write their own date/time parsing tables should use .T_ENG as an example, following the forms shown there.

Using Date/Time Parsing Tables:

Your date/time parsing table should be declared as an array of _T_parse structures ending in a structure with a NULL "str" as described above. Compile this array as the definition of an external data object, giving the object a suitable name; then link the object with the program that wishes to use the new table.

To use the table, the program must issue the function call

old = .tlang(ptr);

where "ptr" points to the table. The return value of .TLANG is a pointer to the previous date/time parsing table. From this point onward, functions like .TOSEC parse date/time strings using the new table rather than the old.

See Also:

expl b tz

expl b lib .tosec

expl b lib .tlang

Copyright © 1996, Thinkage Ltd.