Thinkage Ltd.
85 McIntyre Drive
Kitchener, Ontario
Canada N2R 1H6
Copyright © 1998 by Thinkage Ltd.
1. Introduction 1.1 Organization of this Manual 1.2 Caveats 2. Design Considerations 2.1 Efficiency Goals 2.2 Pointers 2.3 Naming Conventions 2.3.1 Library Programmers 2.4 Function Entries 2.5 Logical vs. Physical Blocks 3. I/O Data Structures 3.1 The Units Vector 3.2 The FILE* Table 3.3 I/O Control Vectors (IOVs) 3.3.1 Design Notes for the IOV 3.3.2 Referring to IOVs 3.3.3 TSX0 Functions 4. Set-Up and Tear-Down 4.1 IOVs Created at Load Time 4.2 Creating IOVs with OPEN 4.2.1 Overview of the Open Operation 4.2.2 Details of the Open Operation 4.2.3 Opening Strings for I/O 4.3 Tear-Down of IOVs 4.4 Concatenated Files 5. Sequential Character Input 5.1 Fetching the Character 5.2 Fetching a New Record 5.3 Return From CHA.GT 5.4 Fetching a Character String 6. Sequential Character Output 6.1 Outputting the Character 6.2 Outputting Logical Records 6.3 Reaching the End of the Logical Block 6.4 Outputting a Character String 6.5 End of File 6.6 COMDECKS 7. Seeking Operations 7.1 The "fpos_t" Structure Type 7.2 The fseek and ftell Functions 7.3 The Seeking Process 7.3.1 Step 1: Get the Logical Block into Memory 7.3.2 Step 2: Adjustments for Partitioned Records 7.3.3 Step 3: Reading Past the High Water Mark 7.3.4 Step 4: Finding the Right Record 7.3.5 Step 5: Entering an Ambiguous State 7.3.6 Step 6: Resolving the Ambiguous State 7.3.7 Notes on Seek Operations 7.4 Rewind Operations 8. Error Handling 8.1 Error Tables 8.2 Posting Error Messages 8.3 C's "errno" Facilities 8.4 Default Messages 8.5 Retrieving a Posted Message 8.6 The IO.ERR Function 8.6.1 Clearing an Error 9. Auxiliary Functions 9.1 Flushing a Stream 9.2 Backing Up a Stream 9.3 The UNGET Operation 9.4 The REREAD Function 9.5 Rewinding a Stream 9.6 The FSFILE Operation: Spacing an Input File Forward 9.7 Binary Data 9.8 Other Functions Using the IOV
This manual describes the design and implementation of the UW Tools I/O library. The library is available to programs compiled with the B compiler of the UW Tools package, as well as to programs compiled with the single-segment (accommodation mode) versions of C and Pascal.
The main body of this manual is divided into the following sections:
The low level functions and data structures described in this document may change with each release of the UW Tools package. They may even change when patches are applied to fix bugs. Thus user programs should never call these functions or reference these data structures directly--use high level functions instead.
If a function or data object has a dot "." anywhere in its name and is not documented under "expl b lib", it may change without notice and should not be referenced directly by user programs. Such functions and data objects are documented here in The UW Tools B I/O Program Logic Manual solely to help you understand program dumps.
The format of the I/O control vector in release 8FW1.5 differs substantially from previous versions. New vector entries have been added and many old entries have new offsets within the vector.
This chapter looks at the design philosophy underlying the UW Tools library, particularly the I/O functions. Users who intend to create their own library functions should pay close attention to the principles provided here.
The I/O routines have been designed to optimize performance under the "character stream" orientation of the programming languages supported. The library provides facilities for non- sequential I/O operations (e.g. seeking, backspacing, rewinding, record and binary I/O). However, such operations are rare in comparison to operations like "read a character" or "write a character". Fast sequential I/O was the highest design priority, and non-sequential operations were not allowed to compromise the efficiency of the sequential character stream.
Many routines in the UW Tools library work with pointer arguments. This section describes differences between the ways that different programming languages use pointers.
In the B programming language, pointers can only address complete machine words. Typically then, B programs store pointers in the lower 18 bits of a machine word, leaving the upper halfword empty (i.e. zero).
C and Pascal programs, however, must be able to address individual characters within a machine word. This means they need 18 bits to reference the machine word and another 2 bits to reference one of the four bytes within the word. On GCOS8, the optimal format for such a pointer is to store the word address in the upper 18 bits of the word, and the byte offset in the next two bits. In some cases, it is also possible for the lower half-word to contain a non-zero SEGID; however, because the UW Tools library is only implemented in SS mode (accommodation mode), the vast majority of pointers have SEGID of zero. (For more information on SEGIDs, see the manuals describing the hardware and instruction set of your GCOS8 system.)
Because of this difference between languages, all library functions that work with pointers must accept either pointer format. The rule is:
The UW Tools library contains two types of functions: user functions and internal functions.
Internal functions are identified by the presence of at least one dot character "." in their names, as in
.READ ACC.FI ..IOVP
Similarly, the data objects used by the library divide into user objects and internal objects. Internal objects are identified by at least one dot character in their names.
Because the dot character is used to identify internal library functions and data objects, programmers should not put dots in the names of items in normal user code. This ensures that user names do not conflict with internal library names. The only "dot names" that you may safely define in your programs are ones which the documentation specifically says you may define (e.g. the .GET symbol whose presence or absence determines the way in which files are accessed).
It is not enough to avoid conflicts with existing dot names. Each new release of the UW Tools package may define new internal symbols with dots in their names, and if old source code conflicts with these new names, the result may be disastrous.
Programmers who write their own low level library may wish to use dot names for these library routines. Since users are not supposed to use dot names for user functions, giving a dot name to a low level library function ensures no conflict with user names. However, there is no guarantee that such dot names will not conflict with UW Tools internal functions in later releases of the package.
When writing low level library functions for use with C programs, you should also use names that cannot conflict with user names. The ANSI standard for C reserves all external names beginning with an underscore for library use. However, the ANSI standard lets users create local names (e.g. local variables or macros) whose names begin with an underscore followed by a lowercase letter. For safety's safe, therefore, it is better to choose different names for library routines: names beginning with two underscores or an underscore followed by an uppercase letters. Users may not use such names for external or for local purposes.
NOTE: Programmers may know that the SS mode linker ignores the case of letters in external names. Therefore you might think there would be a conflict between an external name declared as _ABC and a local name declared as _abc. However, the C compiler pays attention to the case of letters, so the two names are distinct during compilation. The local name is only relevant during compilation--it is not seen by the linker--so the two names don't conflict.
If you are writing a low level library function that may be used in either B or C, it is best to use a dot name. Such names are not valid in C and are reserved in B, so you can be sure they do not conflict with user names. If you are writing the function in C, you can use #alias or #equate directives to create dot name symbols, even though such names are not valid in C.
In B, every function starts with a machine instruction that might be represented as
func tra func+1
In other words, this jumps to the second word of the function. Normally then, this instruction just continues on to the next instruction. However, consider two functions that start
X tra X+1 ... Y tra Y+1
The operation
Y = X
writes the TRA instruction from X over the TRA instruction in Y. The result is that whenever someone calls Y, the TRA instruction now jumps to X+1 instead of Y+1. Thus anyone calling Y actually executes function X instead. Programs may make use of this principle to "equate" functions.
C functions do not have the same structure. Thus if you are programming a function in C but intend the function to be usable in both B and C, it's a good idea to add the TRA instruction at the beginning of the C function so that B programs can execute the assignment operation discussed above. You can do this by creating a small GMAP assembler module consisting of a TRA instruction followed by a reference to the C function. You could also use C code based on the following:
static dummy1() { /* Code for the real function */ } int (*func)() = { (int (*)()) ( (int) dummy1 + 0710000) };
This creates a static (local) function containing the actual function definition, then sets up an external (global) function consisting of a TRA instruction to the local function. This will be known by the external name func, but func is just a jump to the real function.
The UW Tools library distinguishes between logical and physical blocks. A physical block is a collection of records read in a single physical I/O operation. Each physical block may contain a number of logical blocks, and the same number throughout the file.
Suppose that each physical block contains three logical blocks. Then successive physical I/O operations might read in logical blocks 0-2, then 3-5, and so on, in multiples of three. If a seek operation wanted to seek to a location in logical block 4, the physical read operation would read in the physical block containing logical block 4 (which means logical blocks 3-5).
This chapter looks at fundamental I/O data structures: the Units Vector, the C FILE* table, and the I/O Control Vector.
The Units Vector is a B vector named "io.units". This vector is the key to finding information about any open I/O unit (i.e. I/O stream).
The Units Vector is indexed by B unit number. Therefore the vector contains entries indexed from -5 through 49, inclusive. For example, io.units[1] provides information on unit 1 (the standard output).
Each entry in the Units Vector is a pointer to the I/O Control Vector for the associated unit. If a particular unit is not open, the corresponding io.units entry is zero.
The FILE* Table is named "_iounit". It serves the same purpose as the Units Vector, but is designed for use in C programs rather than B programs. A C pointer declared as FILE* points into the FILE* table. In turn, the indicated entry in the table contains the B unit number associated with the IOV.
You might expect that the FILE* pointer would point directly at the I/O Control Vector, rather than this indirect reference through the FILE* table. The indirection through the table is necessary to support the C freopen function, which lets you reassign an existing FILE* pointer to indicate a new stream. The freopen function is implemented simply by changing the appropriate FILE* table reference to a new B unit number.
Pascal programs make use of the FILE* Table in a manner similar to C. Since B programs only need the Units Vector, they don't have a FILE* Table. All programs contain the Units Vector.
Each open stream has an associated I/O Control Vector, providing all the information needed to perform I/O on that unit. For the rest of this manual, "I/O Control Vector" will be abbreviated as IOV.
The IOV begins with various types of control information, including pointers to functions that can perform I/O on the associated stream. Buffers for the I/O stream are allocated after this control information.
After the buffers come various pieces of auxiliary information. For example, the name of the file is stored as a string in the auxiliary information at the end of the IOV; the control information at the beginning of the IOV contains a pointer to this filename. The auxiliary information also contains the mode string argument passed to the OPEN function that opened the associated stream.
The list below shows the lay-out of an IOV, including the symbolic name of each entry, the offset (shown in parentheses) and the purpose of each entry. With halfwords, offsets are marked with L for the lower halfword and U for the upper. Note that some offsets have more than one symbolic name, since some offsets have different purposes in different contexts. Entries described as "functions" are actually TRA hardware instructions jumping to functions that perform the specified actions.
PEF.FL 000001 Physical EOF of file (input) EOF.FL 000002 EOF on input BOF.FL 000004 Beginning of file on input STR.FL 000010 Core-to-core I/O unit NUL.FL 000020 Undefined unit TTY.FL 000040 TTY-like device (record vs block I/O) NEF.FL 000100 no eof or flush on close LNO.FL 000200 line numbered input file NGF.FL 000400 no GFRC I/O SLW.Fl 001000 outstanding slews (media 7) MSS.FL 002000 message on error RET.FL 004000 return from error DEF.FL 020000 default I/O unit (i.e. opened by .setu.) VIP.FL 040000 This is a vip FLS.FL 100000 Block needs to be flushed RAW.FL 200000 Raw type tty output. OUT.FL 400000 output unit
BTS.FL 000001 .tell returns bytes from 0 WEF.FL 000002 Write EOF on fseek, flush OFR.FL 000004 file can be read OFW.FL 000010 file can be written OFA.FL 000020 file was opened for append CAP.FL 000040 append mode: seek to eof on output PRE.FL 000100 pre read next block after write SSF.FL 000200 Allow seeks on sequential files NRW.FL 000400 file opened with no rewind ZMO.FL 001000 return -1 on eof, not zero FNV.FL 002000 .FSIZE is estimated, not definite ZDU.FL 004000 Duplicate AFTNAM check suppressed on this iov OPC.FL 020000 .openc re-using this iov; don't rslevec EFD.FL 040000 write EOF rcw on all physical block output
file1+file2+file3
The upper half of FIL.CT holds the total number of files in in the string. The lower half of FIL.CT holds the number of the file that is currently in use.
Pointers are generally in the upper half unless otherwise noted.
The IOV has been designed to optimize the normal flow of I/O operations on the stream. The most important aspect of this optimization is the presence of the function pointers in the control information section. These point to functions which are appropriate for performing operations on the IOV, according to the current state of the stream.
As a simple example, consider the function associated with CHA.PT, the function that outputs a character to the buffer. Suppose that the user opens a file as a Media 6 text file. The open operation therefore sets up the CHA.PT entry to point to a function that outputs a single character into a Media 6 format record, stored in the buffer of the IOV. Later on, the program may issue a SET.MC function to change media codes for the stream. One of the effects of SET.MC will be to change the CHA.PT pointer so that it points to a function that outputs the new media code rather than the old.
Note that this approach removes some overhead whenever the program wants to output a character, since the program doesn't have to choose different actions based on the current media code. The program just has to call the current CHA.PT function, in the knowledge that CHA.PT has already been set up to use the right media code. Thus the I/O routines don't have to check the media code each time a character is output; the IOV is set up appropriately whenever the media code changes, and things happen automatically after that.
Changing media codes is just one of the actions that might change function pointers in the IOV. For example, suppose you are using a media code in which text records have a fixed length:
As this example shows, the IOV function entries may point to different functions, depending on the context of what has gone before. This eliminates the overhead that would be necessary if the output-a-character routine were a single function that had to check for a variety of contexts each time it was called. Fewer decisions have to be made during normal I/O because each I/O function knows it will only be called in a given context.
Because the IOV combines data with local functions specifically designed to manipulate that data, the IOV is an example of an Object, in the sense of Object-Oriented Programming.
Many library functions except arguments specifying the stream on which you want to perform I/O. In the current release, the argument you pass can be any of the following:
The library identifies the nature of the passed argument by looking at its value. An integer less than 100 is taken to be a unit number. Any other value is taken to be a pointer to the IOV, unless it points into the block of memory occupied by the FILE* table, in which case it is taken to be a FILE*.
Some of the functions referenced in the IOV are designed to be called with a TSX0 hardware instruction rather than the usual TSX1. These TSX0 functions have an abbreviated calling sequence that makes them quicker to use in appropriate situations. In particular, TSX0 functions don't perform the usual operations to save and restore the values in certain hardware registers; thus TSX0 functions can do the jobs more quickly, but they may only be called in situations where you've already saved any necessary register values.
For most TSX0 functions, there are corresponding TSX1 functions that can be called when the TSX0 versions are too quick-and-dirty to handle the situation. For example, ".READ" is a TSX1 version of the TSX0 function "..RD". Both do the same operation (see "expl b lib .read") but ..RD is faster in applicable situations.
The table below lists the TSX0 functions currently implemented. Function arguments are passed in various hardware registers, as noted. Some of the functions are invoked with an actual TSX0 instruction, others with an XED. In the table, IOVP stands for a standard B pointer to an IOV.
This chapter deals with the allocation and de-allocation of IOVs.
A few IOVs are created at the time the program is loaded. Each such IOV may be referenced by a symbol (SYMDEF):
All other IOVs are created through the OPEN function. This includes IOVs associated with files and with other I/O devices, like the SYSOUT queue and the console.
Note that the C fopen function and the Pascal openf procedure both call the UW Tools OPEN to do their work. OPEN may also be called during program set- up, to handle redirection or to deal with P* and the console in batch.
Roughly speaking, OPEN follows these steps:
Below we give a more detailed explanation of the steps executed by the OPEN function.
.READ also sets address register RD to point to the same place as .RD.IOV.
.WRITE also sets address register WR to point to the same place as .WR.IOV.
SETD.R -- read operations on disk files SETD.W -- write operations on disk files SETD.A -- append operations on disk files SETT.R -- read operations on tape SETT.W -- write operations on tape SETT.A -- append operations on tape
These routines set up as many of the function pointers as possible. However, they may not be able to set up all the function pointers. For example, SETD.R cannot fill in entries that depend on media code--since the file hasn't been read yet, SETD.R doesn't know the media code of the first record. SETD.R does fill in the function pointers for any entries that are device dependent but not media code dependent.
SYSOUT in batch is a special case. If this OPEN operation opens SYSOUT in batch, SET.IO handles that directly without calling an auxiliary routine.
WR.MC0 WR.MC1 WR.MC2 WR.MC3 WR.MC4 WR.MC5 WR.MC6 WR.MC7 WR.M10
When OPEN is called to open strings for I/O, the process parallels the process for opening a file. However, OPEN does not call the SET.IO function to put appropriate function pointers into the IOV. Instead, it calls one of the following functions:
SETS.R -- open string for reading SETS.W -- open string for writing SETS.A -- open string for appending
All of these functions operate in a similar way.
The first step is to set up TALLYB pointers to the string being opened. These are set up in the IOV and are created regardless of whether the string is being opened for input or for output.
When the string is being opened for input, OPEN next sets up the CHA.GT entry of the IOV. This points to a function which fetches one character at a time from the string, but which refuses to read past a null byte ('*0'). It also sets up the STR.GT entry of the IOV; this points to a function which fetches an entire string, up to the next new-line. OPEN doesn't need to set up any of the other function entries in the IOV.
When the string is being opened for output, OPEN must set up the following entries:
The CLOSE function is in charge of closing a file and releasing IOVs. CLOSE follows these steps:
The UW Tools library lets you specify a concatenated file list in any context where a single input file is accepted. Such a list is written as
file1+file2+file3+...
where any number of files may appear in the list.
When you pass a file list as an argument to OPEN, OPEN breaks the list into separate file names and stores these filenames in the auxiliary space at the end of the IOV. Each file name is stored as a B string, with a null byte (*0) used to mark the end of the string. Each file name starts on a word boundary. For example, file1+file2 is stored in four machine words as
file 1*0*0*0 file 2*0*0*0
In earlier releases of the package, concatenated file strings were stored as one long string.
The FIL.NA entry of the IOV refers to the name of the file currently being read; the lower half of FIL.NA points to the file name string and the upper half tells the length of the string (in words). The upper half of the FIL.CT entry tells the total number of files in the concatenated list and the lower half tells the number of the file you're currently reading.
When the library reaches end-of-file on one file in the list, it calls CLOSE to close the file's IOV. It then opens the next file in the list, re-using the same IOV in order to avoid fragmenting memory. Because the library uses the same IOV for all the files in the list, the IOV is allocated with the maximum size needed for any type of input device (disk file, tape, terminal, or console).
If one of the files in the concatenated list is $TTY, referring to the terminal, OPEN creates an indirect IOV for the file list. This IOV can refer to IOVs for files or to one of the predefined terminal IOVs, set up at load time.
This chapter describes the steps followed during a typical operation to read a single input character. For example, when you call the C getchar function or the B getc function, this chapter describes the low level I/O functions that are invoked to honor the "get character" request.
The high level I/O function jumps to the CHA.GT entry of the IOV associated with the file or device you want to read. This entry jumps in turn to a "get character" function for the media code of the input stream. Possible functions are:
GC.MC0 GC.MC1 GC.MC2 GC.MC3 GC.MC4 GC.MC5 GC.MC6 GC.MC7 GC.M10
Each of these returns the next character in the input stream. Normally, this is just the next character from the record that is in the input buffer. However, if the invoked function finds that it has reached the end of the current record, it transfers control to "GC.FCH".
GC.FCH has the job of fetching a new record, then returning the first character of the new record. To fetch the new record, it uses the REC.GT entry of the IOV. When REC.GT returns, GC.FCH uses CHA.GT from the IOV to get the first character of the new record. GC.FCH returns this character.
Whenever the I/O library functions know that they're going to have to get a new record the next time they want to read a character, they stuff the address of GC.FCH into the CHA.GT entry of the IOV. This ensures that the new record is automatically fetched the next time someone calls CHA.GT. The media code reading functions stuff GC.FCH into CHA.GT when they hit end-of-record.
The record level I/O routine stuffs GC.FCH into CHA.GT whenever it has to read a new record. The reasoning is that the new record may have a different media code from the previous record and therefore CHA.GT may have to be changed to refer to a different "get character" function.
The REC.GT entry of the IOV jumps to a function that gets a new record from the associated stream. The function depends on the type of device associated with the IOV. Possibilities are:
If GT.REC finds that it has reached the end of the logical block, it uses BLK.GT from the IOV to read a new logical block. In turn, BLK.GT calls one of the following:
When a new physical block must be read from a disk file, GT.PHY calls the RAW.IO function as specified in the IOV, which in turn calls a library function named "RD.DSK". RD.DSK performs the physical read operation with the system call appropriate to the environment, and adjusts the status to reflect the true degree of success of the operation, setting a good status if any data was read successfully.
When a new physical block must be read from a tape file, the operation is similar, but RAW.IO calls a library function named RD.TAP instead.
After a physical read operation, the library functions update the seek address for disks and the block count for tapes. They also set the logical block pointer (in the upper half of .BLOCK) to the top of the physical block (indicated in the lower half of .BLOCK).
RD.DSK and RD.TAP may call the function IO.ERR if they detect an error. IO.ERR may or may not write out an error diagnostic message, depending on the setting of the MSS.FL bit in FL.IOV. Then it either exits, or it returns the negative of the major status, depending on the setting of the RET.FL bit in FL.IOV. This return value is usually propagated back up the chain of library functions. However, in the case of end-of-file, library functions return 0 or -1, depending on the value of the ZMO.FL flag in F2.IOV.
If CHA.GT receives a non-zero return value or logical end-of-file (indicated by a zero-length RCW), the CHA.GT entry in the IOV is set to the function "GT.EOF". This is a special function that always returns the end-of-file character. Thus, all subsequent "get character" operations return EOF.
The process of reading a string (e.g. with the B function getstr) is based on the facilities for reading a single character.
The STR.GT entry of the IOV points to a function that reads a string from the logical block. The function may be one of the following:
In simple cases, the routine indicated by STR.GT uses an EIS move (with translate if required) to read the substring from the logical block into the memory location specified by the calling function. The routine advances the working tally (.INREC) by the appropriate amount and calls REC.GT if the tally runs out. In more complicated cases (COMDECKS, print images, and times when UNGETC has been used to push back characters), GS.1CH calls CHA.GT repeatedly to read one character at a time until the requested string has been read.
This chapter describes the steps followed during a typical operation to write a single input character. For example, when you call the C putchar function or the B putc function, this chapter describes the low level I/O functions that are invoked to honor the "put character" request.
The first step is to call the CHA.PT function as specified in the IOV. This may indicate any of the following routines:
In all cases, the function edits horizontal and vertical slew characters, expands tabs, handles backspaces and line length overflows, and ends logical records on vertical slews. If the output must be written in BCD, ASCII characters are converted to BCD.
Once all the media code processing has taken place, the media-specific function calls "PC.RAW". PC.RAW places the processed characters sequentially into the record image via the working tally (.INREC in the IOV). The record image lies within the block image.
When the tally runs out, PC.RAW passes control to REC.PT to output the logical record. On terminals and the console, REC.PT is one of
PT.TTY or PT.CON
which perform the appropriate system calls to output the record. For blocked record devices, REC.PT is one of:
The PT.REC function builds the RCW (partitioned if necessary) for the last record, advances the available block pointer (.INBLK) and constructs tallies (.RECRD, .INREC) to this area.
In order for the above routines to make decisions about the lengths of blocks (e.g. for PT.RC2 to discard characters written past column 80) they construct their tallies one character too long. For example, PT.RC2 builds a tally of 81 characters, and PT.REC builds tallies one character longer than the space remaining in the block. PT.RC4 is the exception to this: it builds a tally that is exactly the size of the next logical block.
At the end of a logical block, the routines that output a record build the next logical block by calling "BLK.PT". BLK.PT increments the BSN and builds the BCW for the block. It then calls one of the following:
Except for PT.CUR, all of these advance the logical block pointer (the upper half of .BLOCK) to point to the next logical block.
The PT.BLK routine performs the physical write operation by calling RAW.IO+1. This invokes either WR.DSK to write to a disk or WR.TAP to write to tape. After the write, the function resets the logical block pointer to the top of the physical block.
After the write operation, PT.BLK calls IO.ERR if an error was detected. IO.ERR may or may not write out an error diagnostic message, depending on the setting of the MSS.FL bit in FL.IOV. Then it either exits or returns, depending on the setting of the RET.FL bit in FL.IOV.
When control returns to PT.BLK, it sets the available block pointer (.INBLK) to point to the top of the logical block.
The process of outputting a string (e.g. with the B function putstr) is based on the facilities available for outputting characters.
The low level library routine for outputting a string is called "PT.DES". This breaks the string into substrings, breaking at any slew characters or unprintables. For example, consider the string
Hello*nHow are you?
where '*n' represents a new-line character, written as '\n' in the C programming language. PT.DES would break this into
Hello *n How are you?
PT.DES calls CHA.PT to output unprintable characters, and special characters like the '*n'. For simple strings (like "Hello" and "How are you?" above), PT.DES builds an EIS descriptor and a length, then passes them to DES.BU as specified in the IOV. The DES.BU routine is one of
In easy cases, the DES.BU routine uses an EIS move (with translate if required) to put the substring into the record image. It advances the working tally (.INREC) by the appropriate amount and calls REC.PT if the tally runs out. For print images and COMDECKS, the PD.1CH passes each string character individually to CHA.PT to facilitate overflow processing, and blank and slew optimization.
Functions like CLOSE and EOF need to be able to output an end-of-file to an output stream. To do this, they call EOF.FI from the IOV, which is one of the following functions:
The COMDECK writer, PC.CDK, performs some special processing. PC.CDK is invoked when you attempt to write Media Code 1 using character stream output functions. Since COMDECK records have a fixed size, and Media Code 1 is also the normal media code to use for variable length binary, special processing is required to reconcile fixed lengths within a variable format.
First, PT.MC1 loads REC.PT to do normal variable length record processing. When the first character is passed to PC.CDK, it starts a COMDECK and replaces the current CHA.PT with a pointer to a routine inside PC.CDK. It also replaces REC.PT with a routine that:
As a result, once a COMDECK is started, it is terminated by the next call to REC.PT. Normally this happens because of a call to CLOSE, but it can also happen because of a call to PUTREC, PUTBIN, or SET.MC.
This chapter looks at the functions performed in "seek" and "tell" operations: jumping to a different position in the file, and determining your current position after a "seek".
The C fpos_t structure type is used to represent a position inside a file, for use with the fgetpos and fsetpos functions. An fpos_t structure is four machine words long, with the following components:
file1+file2+file3
FP.FIL tells whether you are looking at the first file, the second file, etc.
The symbol FP.SZ has the value of the size of an fpos_t structure, represented as a B "size".
The two low level routines that work with fpos_t structures are ".FGPOS" and ".FSPOS".
If you are using a concatenated list of files, the current version of the library only lets you execute .FSPOS on the first file in the list. In other words, the file number FP.FIL must be 1. You may not seek to another file, and you may not seek back to the first file if you have moved on to a different file.
If you call .FSPOS with the value -1 in the FP.REC entry, .FSPOS interprets the FP.CHR value as a character offset from the beginning of the block, rather than a character offset within a particular record of the block.
The C functions fseek and ftell are older, UNIX-inspired functions that use a single integer to represent a position in a file. For the purposes of this manual, we will call such an integer a seek integer.
The format of a seek integer depends on the type of file on which you intend to seek.
In both cases, seek integers may not allow enough range to address every character in a file, if the file is big enough.
Both types of seek integers can be translated directly into fpos_t structures. However, there is no easy way to translate an fpos_t structure into a seek integer, and indeed, it may be impossible to translate some fpos_t structures into seek integers.
The current implementation of fseek simply translates the seek integer into an fpos_t structure and calls .FSPOS.
The sections that follow describe the steps required to seek to a new file position.
.FSPOS begins by calling a function named ".LBSEEK" (which is actually truncated by the linker to ".LBSEE"). It's the job of .LBSEEK to make sure that the desired logical block is present in memory.
For output units, .LBSEEK must flush any pending output before changing positions within the stream. Thus .LBSEEK calls the appropriate routines from the IOV to finish off the current output record, flush it to the stream, and write an end-of-file mark, if appropriate. For input units, no special processing is required.
Next, .LBSEEK checks to see if the desired logical block is already in memory. If so, .LBSEEK simply adjusts the relevant IOV pointers to point to the block and returns.
If the logical block is not in memory, .LBSEEK must read in the specific piece of the file that contains the block. It does this by calling GT.SPC. After GT.SPC returns, .LBSEEK adjusts the IOV pointers to point to the appropriate logical block within the data just read.
.FSPOS must next check to see if the desired logical block is in the middle of a partitioned record. If so, .FSPOS must find the start of the record to determine the media code of the record.
To find the start of the record, .FSPOS calls .LBSEEK repeatedly to walk backwards through the file. The partition number of the current block indicates the number of blocks to walk backwards (modulo 1024). Once the search reaches the start of the record, .FSPOS makes note of the media code, then calls .LBSEEK once more to get the desired logical block into memory.
The .LBSEEK function only works up to the high water mark of the file, as specified by HI.WAT in the IOV. If the desired seek position lies farther on, .LBSEEK only goes as far as the high water mark. From that point onward, .FSPOS reads through subsequent records one at a time (using REC.GT) until it reaches the desired logical block.
This process is necessary because the library does not automatically determine the end-of-file position. The library knows that the file goes as far as the high water mark, but it doesn't know how much farther the file extends. Thus .FSPOS must read one record at a time to make sure that none of those records represents end-of-file and that the file has no invalid contents.
If .FSPOS reaches end-of-file before reaching the desired seek position, there are two cases:
At this point, .FSPOS has finally managed to ensure that the correct logical block is in memory. It now uses REC.GT to count through the records in the block until it finds the desired record (specified by FP.REC in the fpos_t structure).
At this point, .FSPOS has identified the record containing the target file position. However, .FSPOS can't directly move to the desired character position. The reason is that the library can't tell whether the next operation on the unit will be a read or a write; therefore the library doesn't know whether to set up the IOV for reading or writing.
As a result, the library stores the value of the character offset FP.CHR in the IOV and changes various IOV entries to point to specialized functions:
CHA.GT points to GC.AMB CHA.PT points to PC.AMB DES.BU points to PD.AMB
All of the ".AMB" functions are specialized ones to handle the ambiguous state of the IOV. (Note that it is not necessary to change the STR.GT function pointer, since GETSTR performs at least one CHA.GT to make sure a proper record has been set up and that the correct STR.GT function has been initialized.)
The IOV remains in its ambiguous state until the program attempts a read or write operation on the unit, thereby activating one of the ".AMB" functions in the IOV.
If the program activates the reading function GC.AMB, the function uses CHG.RD to change the IOV to read mode. Then it retrieves the stored character offset and reads that many characters into the current record. Next, the file is positioned to read the appropriate character. The function the restores the previous versions of CHA.GT, CHA.PT, and DES.BU and proceeds with the requested read operation.
If the program activates one of the writing functions (PC.AMB or PD.AMB), the situation is more complicated. First, the activated function retrieves the stored character offset. Next it allocates a buffer and reads characters from the file into the allocated buffer, up to character before the designated offset. At this point, the allocated buffer contains the partial record up to seek position. The activated function then calls CHG.WR to change the IOV to write mode, and uses DES.BU to output the partial record from the buffer to the logical block.
At the end of all this, the IOV is in write mode and is ready to write to the specified seek position. The activated function restores the previous versions of CHA.GT, CHA.PT, and DES.BU, and proceeds with the requested write operation.
The CH.NL entry of the IOV is set to -1 after a seek operation, since the previous character count is not known. If you then attempt an UNGETC followed by an .FGPOS to determine the position, you will get an error except if you are at EOF.
If you are at EOF and CH.NL is -1, .FGPOS must determine the character position by looking at the record just before the EOF. Therefore .FGPOS looks at the previous record and counts characters to determine the character offset.
Combining seek operations with ungetc can confuse .FGPOS. For example, if you seek to the beginning of a record, then ungetc a character and call fgetpos, the result is usually an error condition, since .FGPOS can't figure out where it is.
Rewind operations are similar to seeking to the beginning of the file. However, there is one exception. Instead of putting ".AMB" functions into the IOV, the rewind operation puts in special functions named GC.REW, PC.REW, and so on. These functions simply reinitialize the IOV for reading or writing, depending on the nature of the next I/O call.
This chapter looks at library facilities for error handling and recovery from errors.
An error number is a 36-bit quantity representing a type of error recognized by the library. The upper 18 bits of the error number is an index into a list of error tables ; the lower 18 bits of the error number refers to an error in that table.
An error table is a GMAP vector used in issuing error messages. The .STRER function maintains a list of pointers to the error tables defined by the library, so that each table may be referred to by an index into this list.
The first word of the table is a pointer to a function which can "decode" the rest of the table. In order to issue an error message from the table, you invoke this function, passing it a pointer to the table and an error number, indicating which message you want to issue. It's the job of the function to use the bottom 18 bits of the error number to find the appropriate message within the table. Note that different tables may have different formats and therefore different functions to work with those formats.
The most common table format is identified by the name ".ER.TA". It has the form:
At present, the library uses three error tables with this format:
Programmers writing new library routines may create their own .ER.TA error tables or may create new error table formats. To add these to the library, you have to add your table to a list that appears in the library's .STRER function. .STRER assigns an index value to each such table, and each table gets its own range of error numbers.
Since the value of an error number tells both the error table index and the number of an error in that error table, .STRER can locate an error message just from the error number.
When a library function detects an error, it calls a function named .EPOST to post the error message. The posting process doesn't display the message directly to the user; it simply stores the message so that it can be retrieved later. Thus library functions may post an error message at any time during the error handling and recovery process, presumably at a point when the library can provide the user with the maximum amount of useful information about the error. The posted message can then be output to the user later on, if and when the recovery process determines that the message should be displayed.
The .EPOST function takes at least two arguments:
In addition, .EPOST takes a variable length list of argument values corresponding to any placeholders in the format string.
.EPOST calls printf to format the message, then associates the formatted message with the given error number. In this way, the message is posted in connection with the error number.
.EPOST only keeps one posted message at a time. The next call to .EPOST overwrites the previous posted message, whether the new call specifies the same error number or a different one. Thus the posted error message is always the most recent one generated by the library.
The ANSI standard for C requires implementations to support a simple errno concept. A non-zero value assigned to the variable errno indicates an error condition.
In general, library functions should not assign error numbers directly to errno. Instead, they should call .EPOST. .EPOST automatically assigns the value of its error number argument to errno.
B source code can refer to the errno variable using the name ".ERRNO". This is just another name for the same data object.
The problem with C's errno is that there is only one for the whole program. Asynchronous conditions (e.g. the user pressing BREAK) can overwrite a different value on top of the errno generated by an I/O error. Therefore, you can't really depend on the current value of errno actually reflecting the error that originally triggered the error condition. It is best to use functions like .EPOST instead, which keep track of a separate error number for each IOV and which do not get overwritten during asynchronous interrupts. When writing your own functions, you can also indicate errors by passing back "error status" return values rather than assigning a value to errno.
In addition to the posted error message, .EPOST maintains a list of default error messages for each error number. If you call .EPOST without a format string (just an error number), .EPOST gets rid of the posted error message and substitutes the default message.
The default messages may be more general than specifically posted error messages; they cannot provide the sort of specific information you can put into a posted message. However, the default message is often enough to tell the user what went wrong.
To obtain the posted message associated with a particular error number in a C program, use the __STRERROR function (with two underscores at the beginning). This function takes an error number as its argument and returns the "best" error message associated with that error number. For example,
error(__strerror(number));
uses __STRERROR to obtain the "best" error message associated with that error number. If the error number matches the error number of the currently posted message, __STRERROR returns the posted message; otherwise, it returns the default message associated with the error number.
B programs may invoke __STRERROR under the name ".STRER". Both names refer to the same function.
In addition to __STRERROR, there are two other functions which obtain error messages
Since STRERROR is recognized by the ANSI standard, it should be used in programs that require maximum portability. However, __STRERROR (with two leading underscores) returns the posted error message whenever appropriate, and therefore provides better quality error messages overall.
When an I/O function encounters an error on a stream, it calls the IO.ERR function to deal with that error. IO.ERR is invoked with the following arguments:
If the flags in the IOV indicate that the program should terminate on the encountered I/O error, IO.ERR calls .EPOST to post the error message. IO.ERR then outputs the message to the standard error stream and terminates the program.
If the flags in the IOV indicate that the program would like to attempt recovery from the error, IO.ERR begins by putting the indicated IOV into an error state. It does this by copying all the function pointers from the IOV to a save area referenced by the ERR.SV entry in the IOV. IO.ERR then replaces all these function pointers with pointers to error functions. The error functions simply return an end-of-file indication whenever they are called. In this way, any I/O operation executed on an IOV in an error state simply results in EOF.
After putting the IOV into an error state, IO.ERR calls .EPOST to post the error message, then returns to the user program. The user program may identify the type of error encountered on the unit with the function call
errnum = f.err(unit);
To issue an error message, the program may call
io.err(unit);
specifying the unit where the error occurred. Note that in this call to IO.ERR, you do not specify an error number or a message.
The format of a diagnostic message issued by IO.ERR is
file: location: message
as in
user/myfile: llink 10: invalid checksum
When the user program wishes to clear the error state of an IOV, it calls a standard function (like C's clrerror), which in turn invokes the CLR.ER function entry in the IOV. The default version of CLR.ER simply restores the saved function pointers from ERR.SV, to put the IOV back into a normal state. For some types of errors, CLR.ER may point to a different function: one that does special processing to recover from the error, before restoring the function pointers.
This chapter looks at various auxiliary functions associated with I/O. For further information on any of these functions, see the explanations under "expl b lib".
The FLUSH function calls FLS.RD for input units and FLS.WR for output units.
FLS.RD uses successive calls to CHA.GT to fetch characters up to and including the next new-line.
FLS.WR causes a logical end-of-record on the output unit by calling PT.REC.
The BACKSP function backs up the working tally .INREC by one character (if possible).
The UNGETC function saves the current value of CHA.GT in the IOV under the name of "UNGT.F". It also keeps a list of all the characters that have been pushed back into the stream with UNGETC; the UNGT.C IOV entry points to this list. UNGETC then changes CHA.GT to point to a routine that obtains characters from the UNGT.C list rather than from actual input, and changes STR.GT to point to GS.1CH so that "get string" operations get characters one at a time through the special CHA.GT.
When the special CHA.GT reaches the end of the list of UNGT.C characters, it restores the old CHA.GT from UNGT.F. The STR.GT function is left as GS.1CH until the next record is obtained.
REREAD resets the working tally .INREC to the start of the record, by copying the refresher tally .RECRD into it.
The REWIND function resets the file position to the beginning of the stream by calling .SEEK to set the read/write position to the beginning of the file.
The FSFILE function scans ahead to the end of the file, then prepares to continue reading from the block after the end of the file. It prepares for reading by decrementing the BCW in the block image (to exclude the EOF) and setting the RCW to -1.
The following functions handle binary data instead of text data.
The following functions all refer to the IOV in some way.
For B output units, EOF writes an end-of-file, by calling REC.PT to terminate the last record and EOF.FI to write the end-of-file. For C output units, EOF does nothing, since the ANSI standard for C only recognizes EOF queries on input files.