B I/O Program Logic Manual

Thinkage Ltd.
85 McIntyre Drive
Kitchener, Ontario
Canada N2R 1H6
Copyright © 1998 by Thinkage Ltd.

1. Introduction
	1.1  Organization of this Manual
	1.2  Caveats
2. Design Considerations
	2.1  Efficiency Goals
	2.2  Pointers
	2.3  Naming Conventions
	2.3.1  Library Programmers
	2.4  Function Entries
	2.5  Logical vs. Physical Blocks
3. I/O Data Structures
	3.1  The Units Vector
	3.2  The FILE* Table
	3.3  I/O Control Vectors (IOVs)
	3.3.1  Design Notes for the IOV
	3.3.2  Referring to IOVs
	3.3.3  TSX0 Functions
4. Set-Up and Tear-Down
	4.1  IOVs Created at Load Time
	4.2  Creating IOVs with OPEN
	4.2.1  Overview of the Open Operation
	4.2.2  Details of the Open Operation
	4.2.3  Opening Strings for I/O
	4.3  Tear-Down of IOVs
	4.4  Concatenated Files
5. Sequential Character Input
	5.1  Fetching the Character
	5.2  Fetching a New Record
	5.3  Return From CHA.GT
	5.4  Fetching a Character String
6. Sequential Character Output
	6.1  Outputting the Character
	6.2  Outputting Logical Records
	6.3  Reaching the End of the Logical Block
	6.4  Outputting a Character String
	6.5  End of File
	6.6  COMDECKS
7. Seeking Operations
	7.1  The "fpos_t" Structure Type
	7.2  The fseek and ftell Functions
	7.3  The Seeking Process
	7.3.1  Step 1: Get the Logical Block into Memory
	7.3.2  Step 2: Adjustments for Partitioned Records
	7.3.3  Step 3: Reading Past the High Water Mark
	7.3.4  Step 4: Finding the Right Record
	7.3.5  Step 5: Entering an Ambiguous State
	7.3.6  Step 6: Resolving the Ambiguous State
	7.3.7  Notes on Seek Operations
	7.4  Rewind Operations
8. Error Handling
	8.1  Error Tables
	8.2  Posting Error Messages
	8.3  C's "errno" Facilities
	8.4  Default Messages
	8.5  Retrieving a Posted Message
	8.6  The IO.ERR Function
	8.6.1  Clearing an Error
9. Auxiliary Functions
	9.1  Flushing a Stream
	9.2  Backing Up a Stream
	9.3  The UNGET Operation
	9.4  The REREAD Function
	9.5  Rewinding a Stream
	9.6  The FSFILE Operation: Spacing an Input File Forward
	9.7  Binary Data
	9.8  Other Functions Using the IOV

1. Introduction

This manual describes the design and implementation of the UW Tools I/O library. The library is available to programs compiled with the B compiler of the UW Tools package, as well as to programs compiled with the single-segment (accommodation mode) versions of C and Pascal.

1.1 Organization of this Manual

The main body of this manual is divided into the following sections:

Design considerations underlying the I/O functions and the UW Tools libary in general.
Description of data structures associated with the I/O system.
Set-up and teardown of I/O data structures.
Interaction and flow of control through data structures during input operations.
Interaction and flow of control through data structures during output operations.
Seek operations on disk files
Error handling facilities
Auxiliary functions that affect the I/O data structures

1.2 Caveats

The low level functions and data structures described in this document may change with each release of the UW Tools package. They may even change when patches are applied to fix bugs. Thus user programs should never call these functions or reference these data structures directly--use high level functions instead.

If a function or data object has a dot "." anywhere in its name and is not documented under "expl b lib", it may change without notice and should not be referenced directly by user programs. Such functions and data objects are documented here in The UW Tools B I/O Program Logic Manual solely to help you understand program dumps.

The format of the I/O control vector in release 8FW1.5 differs substantially from previous versions. New vector entries have been added and many old entries have new offsets within the vector.

2. Design Considerations

This chapter looks at the design philosophy underlying the UW Tools library, particularly the I/O functions. Users who intend to create their own library functions should pay close attention to the principles provided here.

2.1 Efficiency Goals

The I/O routines have been designed to optimize performance under the "character stream" orientation of the programming languages supported. The library provides facilities for non- sequential I/O operations (e.g. seeking, backspacing, rewinding, record and binary I/O). However, such operations are rare in comparison to operations like "read a character" or "write a character". Fast sequential I/O was the highest design priority, and non-sequential operations were not allowed to compromise the efficiency of the sequential character stream.

2.2 Pointers

Many routines in the UW Tools library work with pointer arguments. This section describes differences between the ways that different programming languages use pointers.

In the B programming language, pointers can only address complete machine words. Typically then, B programs store pointers in the lower 18 bits of a machine word, leaving the upper halfword empty (i.e. zero).

C and Pascal programs, however, must be able to address individual characters within a machine word. This means they need 18 bits to reference the machine word and another 2 bits to reference one of the four bytes within the word. On GCOS8, the optimal format for such a pointer is to store the word address in the upper 18 bits of the word, and the byte offset in the next two bits. In some cases, it is also possible for the lower half-word to contain a non-zero SEGID; however, because the UW Tools library is only implemented in SS mode (accommodation mode), the vast majority of pointers have SEGID of zero. (For more information on SEGIDs, see the manuals describing the hardware and instruction set of your GCOS8 system.)

Because of this difference between languages, all library functions that work with pointers must accept either pointer format. The rule is:

If the upper 18 bits are zero, the pointer is assumed to be a B pointer, with the pointer value in the lower 18 bits of the word.
Otherwise, the pointer is assumed to be a C/Pascal pointer, with the pointer value in the upper 20 bits of the word.

2.3 Naming Conventions

The UW Tools library contains two types of functions: user functions and internal functions.

User Functions: User functions may be called by any user program. Their calling sequences are stable from release to release, so that programs written for previous releases of the UW Tools packages keep working as expected with each new release. New features may be added to existing functions, provided they don't change the behavior of older code. If there is any chance of incompatibility, new features are added to the library by adding new functions instead of changing old ones; therefore old source code may continue to use the old functions without change. If new functions eventually make a particular old function obsolete, the old function is phased out over the course of several releases of the package, providing ample notice and time for programmers to change source code to use the new functions.
Internal Functions: Internal functions are only intended to be called by other functions in the library, not by direct calls in user programs. Internal functions may change without notice; they may be discontinued or have their calling sequences change without notice.

Internal functions are identified by the presence of at least one dot character "." in their names, as in

		  .READ    ACC.FI    ..IOVP

Similarly, the data objects used by the library divide into user objects and internal objects. Internal objects are identified by at least one dot character in their names.

Because the dot character is used to identify internal library functions and data objects, programmers should not put dots in the names of items in normal user code. This ensures that user names do not conflict with internal library names. The only "dot names" that you may safely define in your programs are ones which the documentation specifically says you may define (e.g. the .GET symbol whose presence or absence determines the way in which files are accessed).

It is not enough to avoid conflicts with existing dot names. Each new release of the UW Tools package may define new internal symbols with dots in their names, and if old source code conflicts with these new names, the result may be disastrous.

2.3.1 Library Programmers

Programmers who write their own low level library may wish to use dot names for these library routines. Since users are not supposed to use dot names for user functions, giving a dot name to a low level library function ensures no conflict with user names. However, there is no guarantee that such dot names will not conflict with UW Tools internal functions in later releases of the package.

When writing low level library functions for use with C programs, you should also use names that cannot conflict with user names. The ANSI standard for C reserves all external names beginning with an underscore for library use. However, the ANSI standard lets users create local names (e.g. local variables or macros) whose names begin with an underscore followed by a lowercase letter. For safety's safe, therefore, it is better to choose different names for library routines: names beginning with two underscores or an underscore followed by an uppercase letters. Users may not use such names for external or for local purposes.

NOTE: Programmers may know that the SS mode linker ignores the case of letters in external names. Therefore you might think there would be a conflict between an external name declared as _ABC and a local name declared as _abc. However, the C compiler pays attention to the case of letters, so the two names are distinct during compilation. The local name is only relevant during compilation--it is not seen by the linker--so the two names don't conflict.

If you are writing a low level library function that may be used in either B or C, it is best to use a dot name. Such names are not valid in C and are reserved in B, so you can be sure they do not conflict with user names. If you are writing the function in C, you can use #alias or #equate directives to create dot name symbols, even though such names are not valid in C.

2.4 Function Entries

In B, every function starts with a machine instruction that might be represented as

		  func     tra   func+1

In other words, this jumps to the second word of the function. Normally then, this instruction just continues on to the next instruction. However, consider two functions that start

		  X       tra   X+1
				...
		  Y       tra   Y+1

The operation

		  Y = X

writes the TRA instruction from X over the TRA instruction in Y. The result is that whenever someone calls Y, the TRA instruction now jumps to X+1 instead of Y+1. Thus anyone calling Y actually executes function X instead. Programs may make use of this principle to "equate" functions.

C functions do not have the same structure. Thus if you are programming a function in C but intend the function to be usable in both B and C, it's a good idea to add the TRA instruction at the beginning of the C function so that B programs can execute the assignment operation discussed above. You can do this by creating a small GMAP assembler module consisting of a TRA instruction followed by a reference to the C function. You could also use C code based on the following:

		  static dummy1()
		  {
		  /* Code for the real function */
		  }
		  int (*func)() =
				 { (int (*)())
				   ( (int) dummy1 + 0710000)
				 };

This creates a static (local) function containing the actual function definition, then sets up an external (global) function consisting of a TRA instruction to the local function. This will be known by the external name func, but func is just a jump to the real function.

2.5 Logical vs. Physical Blocks

The UW Tools library distinguishes between logical and physical blocks. A physical block is a collection of records read in a single physical I/O operation. Each physical block may contain a number of logical blocks, and the same number throughout the file.

Suppose that each physical block contains three logical blocks. Then successive physical I/O operations might read in logical blocks 0-2, then 3-5, and so on, in multiples of three. If a seek operation wanted to seek to a location in logical block 4, the physical read operation would read in the physical block containing logical block 4 (which means logical blocks 3-5).

3. I/O Data Structures

This chapter looks at fundamental I/O data structures: the Units Vector, the C FILE* table, and the I/O Control Vector.

3.1 The Units Vector

The Units Vector is a B vector named "io.units". This vector is the key to finding information about any open I/O unit (i.e. I/O stream).

The Units Vector is indexed by B unit number. Therefore the vector contains entries indexed from -5 through 49, inclusive. For example, io.units[1] provides information on unit 1 (the standard output).

Each entry in the Units Vector is a pointer to the I/O Control Vector for the associated unit. If a particular unit is not open, the corresponding io.units entry is zero.

3.2 The FILE* Table

The FILE* Table is named "_iounit". It serves the same purpose as the Units Vector, but is designed for use in C programs rather than B programs. A C pointer declared as FILE* points into the FILE* table. In turn, the indicated entry in the table contains the B unit number associated with the IOV.

You might expect that the FILE* pointer would point directly at the I/O Control Vector, rather than this indirect reference through the FILE* table. The indirection through the table is necessary to support the C freopen function, which lets you reassign an existing FILE* pointer to indicate a new stream. The freopen function is implemented simply by changing the appropriate FILE* table reference to a new B unit number.

Pascal programs make use of the FILE* Table in a manner similar to C. Since B programs only need the Units Vector, they don't have a FILE* Table. All programs contain the Units Vector.

3.3 I/O Control Vectors (IOVs)

Each open stream has an associated I/O Control Vector, providing all the information needed to perform I/O on that unit. For the rest of this manual, "I/O Control Vector" will be abbreviated as IOV.

The IOV begins with various types of control information, including pointers to functions that can perform I/O on the associated stream. Buffers for the I/O stream are allocated after this control information.

After the buffers come various pieces of auxiliary information. For example, the name of the file is stored as a string in the auxiliary information at the end of the IOV; the control information at the beginning of the IOV contains a pointer to this filename. The auxiliary information also contains the mode string argument passed to the OPEN function that opened the associated stream.

The list below shows the lay-out of an IOV, including the symbolic name of each entry, the offset (shown in parentheses) and the purpose of each entry. With halfwords, offsets are marked with L for the lower halfword and U for the upper. Note that some offsets have more than one symbolic name, since some offsets have different purposes in different contexts. Entries described as "functions" are actually TRA hardware instructions jumping to functions that perform the specified actions.

IOV.RS (-1U)

Requested buffer size, measured in llinks.

IOV.SZ (-1L)

Size of the IOV including the word at -1.

FL.IOV (0U)

Various bit flags:

		PEF.FL    000001  Physical EOF of file
						  (input)
		EOF.FL    000002  EOF on input
		BOF.FL    000004  Beginning of file on input
		STR.FL    000010  Core-to-core I/O unit
		NUL.FL    000020  Undefined unit
		TTY.FL    000040  TTY-like device (record vs
						  block I/O)
		NEF.FL    000100  no eof or flush on close
		LNO.FL    000200  line numbered input file
		NGF.FL    000400  no GFRC I/O
		SLW.Fl    001000  outstanding slews
						  (media 7)
		MSS.FL    002000  message on error
		RET.FL    004000  return from error
		DEF.FL    020000  default I/O unit (i.e.
						  opened by .setu.)
		VIP.FL    040000  This is a vip
		FLS.FL    100000  Block needs to be flushed
		RAW.FL    200000  Raw type tty output.
		OUT.FL    400000  output unit

F2.IOV (0L)

More bit flags:

		BTS.FL    000001  .tell returns bytes from 0
		WEF.FL    000002  Write EOF on fseek, flush
		OFR.FL    000004  file can be read
		OFW.FL    000010  file can be written
		OFA.FL    000020  file was opened for append
		CAP.FL    000040  append mode: seek to eof
						  on output
		PRE.FL    000100  pre read next block after
						  write
		SSF.FL    000200  Allow seeks on sequential
						  files
		NRW.FL    000400  file opened with no rewind
		ZMO.FL    001000  return -1 on eof, not zero
		FNV.FL    002000  .FSIZE is estimated, not
						  definite
		ZDU.FL    004000  Duplicate AFTNAM check
						  suppressed on this iov
		OPC.FL    020000  .openc re-using this iov;
						  don't rslevec
		EFD.FL    040000  write EOF rcw on all
						  physical block output

.WRIOV (1U)

Pointer to associated output IOV (when the same device serves as both an input and output unit, e.g. a TTY or the console).

REF.CT (1L)

Reference count for this IOV.

CHA.SK (2)

Function to do a character seek on this IOV.

CHA.TL (3)

Function to do a character "fgetpos" on this IOV.

CHA.PT (4)

Function to output a character to the logical block.

CHA.GT (5)

Function to read a character from the logical block.

STR.GT (6)

Function to read a string from the logical block.

DES.BU (7)

Function to put a string from an EIS ADSC9 data construct to the logical block.

REC.PT (8)

Function to build an RCW on write operations.

REC.GT (8)

Function to attach a tally to a record on read operations.

BLK.GT (9)

Function to read a logical block.

BLK.PT (10)

Function to output a logical block.

BLK.WR (11)

Function to flush the current logical block(s).

EOF.FI (12)

Function to write an EOF to the current file.

RAW.IO (13)

Read/write I/O code for given device.

RAW.GT (13)

Function to read a physical block.

RAW.PT (14)

Function to write a physical block.

CHG.RD (15)

Function to change this IOV to read mode.

CHG.WR (16)

Function to change this IOV to write mode.

FLS.RD (17)

Function to flush input in read mode.

FLS.WR (18)

Function to flush output in write mode.

CLS.IO (19)

Function to clean up the IOV when it is being closed.

INI.RD (20)

Function to initialize this IOV for reading, including rewinding and setup.

INI.WR (21)

Function to initialize this IOV for writing, including rewinding and setup.

CLR.ER (22)

Function to attempt recovery from an error condition.

BLK.SZ (23)

Size of logical block in upper half, size of physical block in lower half.

P.IOV (24)

Machine pointer to the start of the real IOV (found in indirect IOVs).

FIL.NA (25)

Word length of current filename in upper half; pointer to current filename in lower half.

FIL.CT (26)

Is used when this IOV corresponds to a string of concatenated files, as in

		file1+file2+file3

The upper half of FIL.CT holds the total number of files in in the string. The lower half of FIL.CT holds the number of the file that is currently in use.

OP.OPT (27)

Pointer to the option mode string specified when opening this unit.

B.SEEK (28)

A seek address into a file, telling where the start of the physical block comes from (for input) or goes to (for output).

B.END (29)

Pointer to the end of the valid information currently stored in the physical buffer. When flushing output, the flush routine will flush up to this point.

.RECRD (30)

Refresher tally to current record.

.INREC (31)

Working tally to current record.

.BLOCK (32)

Pointer to logical block in upper half; pointer to physical block in lower half.

.INBLK (33U)

Pointer to first free word in block image.

CHA.OF (34)

Adjustment to the tally difference between .RECRD and .INREC to get the current character number. For example, suppose that a seek operation refers to the 4th character in the current record. This will not actually be the 4th character from the beginning of the record, since each record starts with various types of control information. Furthermore, some media codes allow for data compression; for example, Media 7 can compress a long string of blanks into a much shorter representation. The CHA.OF value takes all this into account, so that the actual offset of the 4th character in the record will be CHA.OF+4.

CHA.NL (35)

Adjustment to get current character number when tallies have been reset between records. Normally, the current character position within a record can be determined by looking at the difference between the working tally and the refresh tally for the record (plus the CHA.OF adjustment). In between records, however, the working tally and refresh tally are equal. Therefore CHA.NL is used to preserve information needed to determine the current character position, relative to the last new-line.

COL.NO (36)

Adjustment to get output column number (output writers only). As with CHA.NL and CHA.OF, this adjustment is intended to let you establish the current column number position. COL.NO takes into account such things as tab characters, backspacing, and so on. To determine column number completely, you add COL.NO, CHA.OF, and the difference between the working tally and refresh tally.

LIN.NO (37)

The current line number (input readers only).

HI.WAT (38)

High water mark. This is the farthest position (sector) in the file on which the library has performed I/O. If you seek to a position past the high water mark, the library must read through the file from the high water mark to the desired seek position, in case the logical end-of-file occurs before the desired seek position.

IO.STA (39U)

I/O state (for media 3 and 7 writers).

LIN.SZ (39L)

Maximum length of line (for media 3 and 7 writers).

CASE.C (40)

ASCII case bit (040) on BCD input and output.

FLD.CT (41)

Number of characters in field (media 1 input).

LN.END (41)

Line overflow function (media 3 and 7 writers).

CT.CHR (42L)

Character to be repeated by CHR.CT.

UNGT.C (43)

Pointer to list of characters pushed back by ungetc in the upper half; the last character pushed back by ungetc in the lower half.

UNGT.F (44)

Safe store of CHA.GT for ungetc.

FL.FLG (45)

For TTY output, this is a flag indicating whether there is a carriage return on the last line. A value of -1 indicates no pending new-line. A value of zero indicates a possible pending new-line from null input. A value greater than zero indicates a pending new- line from the last I/O.

SLW.CT (45)

Pending slew count (media 3 and 7 writers).

CHR.CT (45)

Count of repeated characters (media 1, 2, 3, 7 input).

RC.SEG (45)

Record segment (media 0 and 6 writers, when writing partitioned records).

EXT.CT (45)

Character count of file extensions past EOF. Note that a seek past end-of-file (on a binary file) does not write intervening characters to that file until this becomes necessary. If, for example, the user seeks past the end of the file but doesn't actually perform a write operation there, there is no need to write intervening characters past the end of the file.

MED.CD (46)

Media and report code.

AFT.NA (47-48)

Holds the ASCII aftname of the file (TSS).

FIL.CD (47)

Holds the BCD file code (batch).

TP.LBL (48)a

Holds the tape label in batch (0 on unlabelled tape).

SYO.CD (48)

Holds the file code for MME GESYOT in batch.

FL.STA (49-50)

File I/O status returned by TSS DRLs or GCOS8 MMEs.

FL.DES (51)

File descriptor word.

.BKSPL (52U)

List of overstruck characters (media 3 and 7 writers).

.TABST (53)

List of tab stops (writers).

SYBUF. (54L)

Pointer to GESYOT buffer.

TP.FCT (54L)

Pointer to label Fact (for a tape). For further information, see the UFAS manual.

SEC.NO (55)

Sector number telling where the library believes the file is positioned.

ERR.NO (56)

Error number of the last error encountered on this unit.

ERR.SV (57)

Pointer to I/O routine transfer save area.

.FSIZE (58)

Size of file in llinks.

SEQ.OF (59)

Next expected block sequence number in upper half; alternate valid value in lower half. Normally, these are the same. However, after end-of-file, the bottom half is set to 1. This indicates that the next block sequence number may be 1 (for the start of the next file) or the next number in sequence. If the next block sequence number read doesn't match either value in SEQ.OF, the library considers it an error.

VIP.NL (60)

TSS is inserting new-lines on VIP input.

.SEGNO (60)

Segment number for continuing partitioned records.

SY.BUF (60)

Start of GESYOT buffer.

TP.BLK (60)

Tape block number.

S.O.B. (61)

Start of block image.

Pointers are generally in the upper half unless otherwise noted.

3.3.1 Design Notes for the IOV

The IOV has been designed to optimize the normal flow of I/O operations on the stream. The most important aspect of this optimization is the presence of the function pointers in the control information section. These point to functions which are appropriate for performing operations on the IOV, according to the current state of the stream.

As a simple example, consider the function associated with CHA.PT, the function that outputs a character to the buffer. Suppose that the user opens a file as a Media 6 text file. The open operation therefore sets up the CHA.PT entry to point to a function that outputs a single character into a Media 6 format record, stored in the buffer of the IOV. Later on, the program may issue a SET.MC function to change media codes for the stream. One of the effects of SET.MC will be to change the CHA.PT pointer so that it points to a function that outputs the new media code rather than the old.

Note that this approach removes some overhead whenever the program wants to output a character, since the program doesn't have to choose different actions based on the current media code. The program just has to call the current CHA.PT function, in the knowledge that CHA.PT has already been set up to use the right media code. Thus the I/O routines don't have to check the media code each time a character is output; the IOV is set up appropriately whenever the media code changes, and things happen automatically after that.

Changing media codes is just one of the actions that might change function pointers in the IOV. For example, suppose you are using a media code in which text records have a fixed length:

When the usual CHA.PT function notices that there's only room for one more character in the current record, the function may put a new pointer into CHA.PT: a pointer to a function that can handle end-of-record.
The next time the program calls CHA.PT to output a character, you get the special end-of-record function which outputs the character and does whatever extra processing is necessary to finish the record. This end-of-record function may change CHA.PT to point to a third function, a special start-a-new-record function.
The next time you output a character, the start-a-new-record CHA.PT automatically starts a new record and stores the output character at the beginning of this new record. The function changes CHA.PT once again to point to the normal output-a-character function.
After an error, the existing CHA.PT is replaced with a special version that always returns end-of-file. The "clear error" operation restores the previous version of CHA.PT.

As this example shows, the IOV function entries may point to different functions, depending on the context of what has gone before. This eliminates the overhead that would be necessary if the output-a-character routine were a single function that had to check for a variety of contexts each time it was called. Fewer decisions have to be made during normal I/O because each I/O function knows it will only be called in a given context.

Because the IOV combines data with local functions specifically designed to manipulate that data, the IOV is an example of an Object, in the sense of Object-Oriented Programming.

3.3.2 Referring to IOVs

Many library functions except arguments specifying the stream on which you want to perform I/O. In the current release, the argument you pass can be any of the following:

A unit number, indicating an offset into the io.units table. This will always be an integer between -5 and 49.
A pointer into the FILE* table.
A pointer to the IOV associated with the stream.

The library identifies the nature of the passed argument by looking at its value. An integer less than 100 is taken to be a unit number. Any other value is taken to be a pointer to the IOV, unless it points into the block of memory occupied by the FILE* table, in which case it is taken to be a FILE*.

3.3.3 TSX0 Functions

Some of the functions referenced in the IOV are designed to be called with a TSX0 hardware instruction rather than the usual TSX1. These TSX0 functions have an abbreviated calling sequence that makes them quicker to use in appropriate situations. In particular, TSX0 functions don't perform the usual operations to save and restore the values in certain hardware registers; thus TSX0 functions can do the jobs more quickly, but they may only be called in situations where you've already saved any necessary register values.

For most TSX0 functions, there are corresponding TSX1 functions that can be called when the TSX0 versions are too quick-and-dirty to handle the situation. For example, ".READ" is a TSX1 version of the TSX0 function "..RD". Both do the same operation (see "expl b lib .read") but ..RD is faster in applicable situations.

The table below lists the TSX0 functions currently implemented. Function arguments are passed in various hardware registers, as noted. Some of the functions are invoked with an actual TSX0 instruction, others with an XED. In the table, IOVP stands for a standard B pointer to an IOV.

TSX0 ..IOVP: A: a value which may be a unit number, IOVP, or FILE* pointer
Returns the corresponding IOVP in the Q register.

TSX0 ..NOEO: AL: an IOVP
Resets the IOVP to a non-EOF status, in preparation for continued reading.

TSX0 ..RD: A: a unit number
Sets the current read unit to the number given in the A register and returns the number of the previous read unit in the Q register.

TSX0 ..UIOV: A: a unit number
Returns the corresponding IOVP in the Q register; if the unit number is invalid, it returns the value of RD.NULL.

TSX0 ..WR: A: a unit number
Sets the current write unit to the number given in the A register, and returns the number of the previous write unit in the Q register.

TSX0 .BLKNO: AR0: an IOVP
Uses the Q register to return the current block number for the specified IOV.

TSX0 .COL: A: an IOVP
Uses the Q register to return the current column number for the specified IOV.

TSX0 .CTDIF: A: an IOVP; Uses the Q register to return the difference in tally count (.RECRD-.INREC).

TSX0 .FNIOV: A: an IOVP; Uses the Q register to return a pointer to the filename associated with the given IOV; if there is no filename, the Q register will be zero.

TSX0 .FNRGS: X1: as set by a TSX1 instruction calling another function
Uses the A register to return the number of arguments in a GFRC caller sequence.

TSX0 .LEN..: A: a pointer to a C or B string
QU: a pointer to where the string length should be stored
Determines the length of the string and stores that length in the location indicated by QU

TSX0 .LENC.: A: a pointer to a C string
QU: pointer to where the string length should be stored
Determines the length of the string and stores that length in the location indicated by QU

TSX0 .LENB.: A: a pointer to a B string
QU: pointer to where the string length should be stored
Determines the length of the string and stores that length in the location indicated by QU
TSX0 .VUNIT: A: a unit number or FILE* pointer
Returns with the condition code set to ZERO if the value in the A register represents a valid unit; if a FILE* pointer was passed, the A register will contain the associated B unit number. .VUNIT sets the condition code to NZ if the argument passed in the A register is not a valid B unit or FILE*.
TSX0 A.BPTR: A: a pointer
Uses the A register to return the given pointer in B pointer format. In other words, it converts a C pointer to a B pointer, and leaves B pointers alone.
TSX0 Q.BPTR: Q: a pointer
Same as A.BPTR, except that it works with the Q register.
TSX0 A.CHAR: A: a pointer
Uses the A register to return the given pointer value as a character offset from zero.
TSX0 Q.CHAR: Q: a pointer
Same as A.CHAR, except that it works with the Q register.
TSX0 A.CPTR: A: a pointer
Uses the A register to return the given pointer in C pointer format. In other words, it converts a B pointer to a C pointer, and leaves C pointers alone.

TSX0 Q.CPTR: Q: a pointer
Same as A.CPTR, except that it works with the Q register.

TSX0 IOV.IN: A: IOVP to an indirect IOV
Q: IOVP to a real IOV
Links the indirect IOV to the real IOV.

TSX0 PC.PAD: AR0: IOVP to an output stream
A: a character
Q: a number (integer)
Writes the given character the given number of times to the output stream (used to pad output, e.g. to write out strings of blanks).

TSX0 Q.WAIT: Must immediately follow a MME GEINOS sequence. Q.WAIT waits for the I/O to complete, and issues MME GERELCs while waiting. It is used in batch I/O routines.

TSX0 SET.WR: AR0: an IOVP
Puts the addresses of disk write routines into the specified IOV.

XED .CHAM1 XED .CHQM1: AR0: an IOVP
Uses the A or Q register to decrement the CHA.OF entry of the IOV by 1.

XED .CHAM2 XED .CHQM2: AR0: an IOVP
Uses the A or Q register to decrement the CHA.OF entry of the IOV by 2.

XED .CHAP2 XED .CHQP2: AR0: an IOVP
Uses the A or Q register to increment the CHA.OF entry of the IOV by 2.

4. Set-Up and Tear-Down

This chapter deals with the allocation and de-allocation of IOVs.

4.1 IOVs Created at Load Time

A few IOVs are created at the time the program is loaded. Each such IOV may be referenced by a symbol (SYMDEF):

RD.TTY: for input from the terminal.
WR.TTY: for output to the terminal.
WR.SYS: for the "system" function.

4.2 Creating IOVs with OPEN

All other IOVs are created through the OPEN function. This includes IOVs associated with files and with other I/O devices, like the SYSOUT queue and the console.

Note that the C fopen function and the Pascal openf procedure both call the UW Tools OPEN to do their work. OPEN may also be called during program set- up, to handle redirection or to deal with P* and the console in batch.

4.2.1 Overview of the Open Operation

Roughly speaking, OPEN follows these steps:

OPEN parses its arguments and decides what it has to do.
It uses the standard memory allocation function to allocate an IOV of the size required by for this stream. The size must be big enough to allow for the control information, the buffers, and the auxiliary information at the end of the IOV.
OPEN calls ACC.FI, which accesses the file under an AFT name or file code. ACC.FI inserts some data into the allocated IOV.
Finally, OPEN calls a number of additional set-up routines to fill more entries in the IOV. The routines that OPEN calls depend on the options specified for the OPEN operation and the type of file being opened.

4.2.2 Details of the Open Operation

Below we give a more detailed explanation of the steps executed by the OPEN function.

OPEN parses its arguments. Based on those arguments, OPEN allocates enough memory for the IOV, including the buffers and the auxiliary information at the end of the IOV.
OPEN calls OPN.SP to determine if this operation is opening the standard input, the standard output, or $TTY. If so, there are two cases:
- If the operation does not involve concatenation of a file with one of the standard units, OPEN simply returns the existing IOV for RD.TTY or WR.TTY (created at the time the program was loaded, as noted earlier).
- If the operation involves concatenation of a file with one of the standard units, as in "file+$TTY", OPEN creates an indirect IOV for the unit. The functions in this IOV simply jump to the corresponding functions in the IOV for the standard unit, and the P.IOV entry is changed to refer to the TTY IOV.
If OPEN is opening an ordinary file (as opposed to the standard input or output), it calls ACC.FI to access the file.
Normally, ACC.FI accesses the file directly. However, if the program has a SYMREF to the external symbol ".GET", the version of ACC.FI that is loaded with the program is not the standard one. The non-standard ACC.FI does a CALLSS to the .GET subsystem to access the file instead of accessing the file directly. Using the non-standard ACC.FI saves about 500 words in the load module, but it means that you must have enough room in the CALLSS stack to call .GET; the non-standard ACC.FI is also much slower at accessing files.
If the unit is being opened for reading, OPEN calls ".READ". This sets values in several variables associated with reading:

.RD.UN.

the number of the current read unit; thus the current read unit becomes the unit that is in the process of being opened.

.RD.IOV

pointer to the current read IOV. Because of null units and indirect IOVs, .RD.IOV is not necessarily the same as io.units[.RD.UN].

.READ also sets address register RD to point to the same place as .RD.IOV.
If the unit is being opened for writing, OPEN calls ".WRITE". This sets values in several variables associated with writing:

.WR.UN.

the number of the current write unit.

.WR.IOV

pointer to the current write IOV. Because of null units and indirect IOVs, .WR.IOV is not necessarily the same as io.units[.WR.UN].

.WRITE also sets address register WR to point to the same place as .WR.IOV.
The next step is to put function pointers into the appropriate entries of the IOV. To do this, OPEN calls SET.IO which in turn calls one of the following:
```
			SETD.R   -- read operations on disk files
			SETD.W   -- write operations on disk files
			SETD.A   -- append operations on disk files
			SETT.R   -- read operations on tape
			SETT.W   -- write operations on tape
			SETT.A   -- append operations on tape
			
```
These routines set up as many of the function pointers as possible. However, they may not be able to set up all the function pointers. For example, SETD.R cannot fill in entries that depend on media code--since the file hasn't been read yet, SETD.R doesn't know the media code of the first record. SETD.R does fill in the function pointers for any entries that are device dependent but not media code dependent.

SYSOUT in batch is a special case. If this OPEN operation opens SYSOUT in batch, SET.IO handles that directly without calling an auxiliary routine.
If the file is being opened for appending, OPEN also calls the CHA.SK routine to position the file at end-of-file.
If the file is being opened for output (either appending or overwriting), OPEN finishes the set-up by calling "SETD.W". SETD.W fills in function pointers for device dependent function entries, then invokes WR.MCA to fill in function pointers for media code dependent entries.
WR.MCA determines the media code that should be used for the first output record, using a default media code if the options to OPEN didn't specify a media code explicitly. Once WR.MCA determines the media code, it calls a media code specific routine to put function pointers into the appropriate entries of the IOV. The media code specific routines are:
```
		WR.MC0     WR.MC1     WR.MC2     WR.MC3
		WR.MC4     WR.MC5     WR.MC6     WR.MC7
		WR.M10
```
The media code specific routines all end by invoking WR.MCN. This performs common set-up operations, specifically tally initialization.

4.2.3 Opening Strings for I/O

When OPEN is called to open strings for I/O, the process parallels the process for opening a file. However, OPEN does not call the SET.IO function to put appropriate function pointers into the IOV. Instead, it calls one of the following functions:

		  SETS.R  -- open string for reading
		  SETS.W  -- open string for writing
		  SETS.A  -- open string for appending

All of these functions operate in a similar way.

The first step is to set up TALLYB pointers to the string being opened. These are set up in the IOV and are created regardless of whether the string is being opened for input or for output.

When the string is being opened for input, OPEN next sets up the CHA.GT entry of the IOV. This points to a function which fetches one character at a time from the string, but which refuses to read past a null byte ('*0'). It also sets up the STR.GT entry of the IOV; this points to a function which fetches an entire string, up to the next new-line. OPEN doesn't need to set up any of the other function entries in the IOV.

When the string is being opened for output, OPEN must set up the following entries:

CHA.PT: output a single character.
DES.BU: put a string from an ADSC9 to the output string.
REC.PT: builds an RCW and outputs a record. If the working tally runs out, the string I/O version of REC.PT extends the tally when called.
EOF.FI: writes an EOF. The string I/O version just writes a '*0' to the string.

4.3 Tear-Down of IOVs

The CLOSE function is in charge of closing a file and releasing IOVs. CLOSE follows these steps:

If the unit being closed was open for output, CLOSE must make sure the final record is written. Therefore, CLOSE calls the function referenced by REC.PT to terminate the final record (if necessary), then calls the function referenced by EOF.FI to write an end-of-file mark on units that need it (disk files and tapes).
CLOSE then calls the function in CLS.IO for any device-specific actions that must take place.
CLOSE deallocates the memory occupied by the IOV and its buffers.
If the file should be removed from the AFT, CLOSE calls RET.FI to remove the file. RET.FI simply performs the system call that is appropriate to the program environment (batch or TSS). Note that the IOV's memory is deallocated before calling RET.FI, in the hope that the same memory can be used as a file system buffer for the system call.
CLOSE calls .UNSTK to remove the file's IOV from the Units Vector and the FILE* Table.
CLOSE calls .READ and/or .WRITE to update the current read and write unit variables.

4.4 Concatenated Files

The UW Tools library lets you specify a concatenated file list in any context where a single input file is accepted. Such a list is written as

		  file1+file2+file3+...

where any number of files may appear in the list.

When you pass a file list as an argument to OPEN, OPEN breaks the list into separate file names and stores these filenames in the auxiliary space at the end of the IOV. Each file name is stored as a B string, with a null byte (*0) used to mark the end of the string. Each file name starts on a word boundary. For example, file1+file2 is stored in four machine words as

		  file
		  1*0*0*0
		  file
		  2*0*0*0

In earlier releases of the package, concatenated file strings were stored as one long string.

The FIL.NA entry of the IOV refers to the name of the file currently being read; the lower half of FIL.NA points to the file name string and the upper half tells the length of the string (in words). The upper half of the FIL.CT entry tells the total number of files in the concatenated list and the lower half tells the number of the file you're currently reading.

When the library reaches end-of-file on one file in the list, it calls CLOSE to close the file's IOV. It then opens the next file in the list, re-using the same IOV in order to avoid fragmenting memory. Because the library uses the same IOV for all the files in the list, the IOV is allocated with the maximum size needed for any type of input device (disk file, tape, terminal, or console).

If one of the files in the concatenated list is $TTY, referring to the terminal, OPEN creates an indirect IOV for the file list. This IOV can refer to IOVs for files or to one of the predefined terminal IOVs, set up at load time.

5. Sequential Character Input

This chapter describes the steps followed during a typical operation to read a single input character. For example, when you call the C getchar function or the B getc function, this chapter describes the low level I/O functions that are invoked to honor the "get character" request.

5.1 Fetching the Character

The high level I/O function jumps to the CHA.GT entry of the IOV associated with the file or device you want to read. This entry jumps in turn to a "get character" function for the media code of the input stream. Possible functions are:

		  GC.MC0    GC.MC1    GC.MC2    GC.MC3
		  GC.MC4    GC.MC5    GC.MC6    GC.MC7
		  GC.M10

Each of these returns the next character in the input stream. Normally, this is just the next character from the record that is in the input buffer. However, if the invoked function finds that it has reached the end of the current record, it transfers control to "GC.FCH".

GC.FCH has the job of fetching a new record, then returning the first character of the new record. To fetch the new record, it uses the REC.GT entry of the IOV. When REC.GT returns, GC.FCH uses CHA.GT from the IOV to get the first character of the new record. GC.FCH returns this character.

Whenever the I/O library functions know that they're going to have to get a new record the next time they want to read a character, they stuff the address of GC.FCH into the CHA.GT entry of the IOV. This ensures that the new record is automatically fetched the next time someone calls CHA.GT. The media code reading functions stuff GC.FCH into CHA.GT when they hit end-of-record.

The record level I/O routine stuffs GC.FCH into CHA.GT whenever it has to read a new record. The reasoning is that the new record may have a different media code from the previous record and therefore CHA.GT may have to be changed to refer to a different "get character" function.

5.2 Fetching a New Record

The REC.GT entry of the IOV jumps to a function that gets a new record from the associated stream. The function depends on the type of device associated with the IOV. Possibilities are:

GT.TTY and GT.CON: for terminals and the console, respectively. Both get the next record by performing the physical I/O, then transfer back to CHA.GT to get the next character from that record.
GT.REC: for blocked record devices (disk or tape). This function prepares the next record. GT.REC checks each BCW and RCW for validity as they are encountered. If any are invalid, the function either exits or continues, depending on the flags set for this stream during the OPEN operation (RET.FL in FL.IOV); also, the function may or may not issue an error message, depending on the MSS.FL flag in FL.IOV. GT.REC skips records which have media code 8 or some other unsupported media code. For all other records, GT.REC constructs tallies to the record in IOV entries .RECRD and .INREC and sets CHA.GT to the routine appropriate for reading the media code of the new record.

If GT.REC finds that it has reached the end of the logical block, it uses BLK.GT from the IOV to read a new logical block. In turn, BLK.GT calls one of the following:

GT.PHY: if it is necessary to get a new physical block.
GT.SPC: if you want to read a specific block instead of the next logical block.
GT.BLK: to read the next logical block. This advances the block pointer stored in the upper half of the IOV entry .BLOCK.

When a new physical block must be read from a disk file, GT.PHY calls the RAW.IO function as specified in the IOV, which in turn calls a library function named "RD.DSK". RD.DSK performs the physical read operation with the system call appropriate to the environment, and adjusts the status to reflect the true degree of success of the operation, setting a good status if any data was read successfully.

When a new physical block must be read from a tape file, the operation is similar, but RAW.IO calls a library function named RD.TAP instead.

After a physical read operation, the library functions update the seek address for disks and the block count for tapes. They also set the logical block pointer (in the upper half of .BLOCK) to the top of the physical block (indicated in the lower half of .BLOCK).

RD.DSK and RD.TAP may call the function IO.ERR if they detect an error. IO.ERR may or may not write out an error diagnostic message, depending on the setting of the MSS.FL bit in FL.IOV. Then it either exits, or it returns the negative of the major status, depending on the setting of the RET.FL bit in FL.IOV. This return value is usually propagated back up the chain of library functions. However, in the case of end-of-file, library functions return 0 or -1, depending on the value of the ZMO.FL flag in F2.IOV.

5.3 Return From CHA.GT

If CHA.GT receives a non-zero return value or logical end-of-file (indicated by a zero-length RCW), the CHA.GT entry in the IOV is set to the function "GT.EOF". This is a special function that always returns the end-of-file character. Thus, all subsequent "get character" operations return EOF.

5.4 Fetching a Character String

The process of reading a string (e.g. with the B function getstr) is based on the facilities for reading a single character.

The STR.GT entry of the IOV points to a function that reads a string from the logical block. The function may be one of the following:

GS.1CH: used for Media 1 input, print images, and characters that have been pushed back with UNGETC. (The UNGETC function automatically sets STR.GT to GS.1CH.)
GS.ASC: used for regular ASCII input.
GS.BCD: used for BCD input.

In simple cases, the routine indicated by STR.GT uses an EIS move (with translate if required) to read the substring from the logical block into the memory location specified by the calling function. The routine advances the working tally (.INREC) by the appropriate amount and calls REC.GT if the tally runs out. In more complicated cases (COMDECKS, print images, and times when UNGETC has been used to push back characters), GS.1CH calls CHA.GT repeatedly to read one character at a time until the requested string has been read.

6. Sequential Character Output

This chapter describes the steps followed during a typical operation to write a single input character. For example, when you call the C putchar function or the B putc function, this chapter describes the low level I/O functions that are invoked to honor the "put character" request.

6.1 Outputting the Character

The first step is to call the CHA.PT function as specified in the IOV. This may indicate any of the following routines:

PC.BCD: for variable and fixed length (card image) BCD output.
PC.CDK: for COMDECK output (Media code 1).
PC.BC3: for BCD print image.
PC.ASC: for normal ASCII (the terminal, Media Code 6, or Media Code 10).
PC.APR: for ASCII print image.

In all cases, the function edits horizontal and vertical slew characters, expands tabs, handles backspaces and line length overflows, and ends logical records on vertical slews. If the output must be written in BCD, ASCII characters are converted to BCD.

Once all the media code processing has taken place, the media-specific function calls "PC.RAW". PC.RAW places the processed characters sequentially into the record image via the working tally (.INREC in the IOV). The record image lies within the block image.

6.2 Outputting Logical Records

When the tally runs out, PC.RAW passes control to REC.PT to output the logical record. On terminals and the console, REC.PT is one of

		  PT.TTY   or   PT.CON

which perform the appropriate system calls to output the record. For blocked record devices, REC.PT is one of:

PT.RC2: for BCD card image (Media Code 2). The routine discards any characters written past column 80.
PT.RC4: for binary byte stream (Media Code 4). This routine starts a new record at the end of each block.
PT.M10: for ASCII card images (Media Code 10). The routine discards any characters written past column 80.
PT.REC: for all other media. This partitions the record if it cannot fit in a single block.

The PT.REC function builds the RCW (partitioned if necessary) for the last record, advances the available block pointer (.INBLK) and constructs tallies (.RECRD, .INREC) to this area.

In order for the above routines to make decisions about the lengths of blocks (e.g. for PT.RC2 to discard characters written past column 80) they construct their tallies one character too long. For example, PT.RC2 builds a tally of 81 characters, and PT.REC builds tallies one character longer than the space remaining in the block. PT.RC4 is the exception to this: it builds a tally that is exactly the size of the next logical block.

6.3 Reaching the End of the Logical Block

At the end of a logical block, the routines that output a record build the next logical block by calling "BLK.PT". BLK.PT increments the BSN and builds the BCW for the block. It then calls one of the following:

PT.BLK: outputs a physical block and sets up for the next block.
PT.CUR: outputs the logical block, but doesn't advance the block pointers.
PT.EOF: outputs a logical end-of-file (which should be part of the record).

Except for PT.CUR, all of these advance the logical block pointer (the upper half of .BLOCK) to point to the next logical block.

The PT.BLK routine performs the physical write operation by calling RAW.IO+1. This invokes either WR.DSK to write to a disk or WR.TAP to write to tape. After the write, the function resets the logical block pointer to the top of the physical block.

After the write operation, PT.BLK calls IO.ERR if an error was detected. IO.ERR may or may not write out an error diagnostic message, depending on the setting of the MSS.FL bit in FL.IOV. Then it either exits or returns, depending on the setting of the RET.FL bit in FL.IOV.

When control returns to PT.BLK, it sets the available block pointer (.INBLK) to point to the top of the logical block.

6.4 Outputting a Character String

The process of outputting a string (e.g. with the B function putstr) is based on the facilities available for outputting characters.

The low level library routine for outputting a string is called "PT.DES". This breaks the string into substrings, breaking at any slew characters or unprintables. For example, consider the string

		  Hello*nHow are you?

where '*n' represents a new-line character, written as '\n' in the C programming language. PT.DES would break this into

		  Hello
		  *n
		  How are you?

PT.DES calls CHA.PT to output unprintable characters, and special characters like the '*n'. For simple strings (like "Hello" and "How are you?" above), PT.DES builds an EIS descriptor and a length, then passes them to DES.BU as specified in the IOV. The DES.BU routine is one of

PD.BCD: for BCD variable length and card image output.
PD.ASC: for regular ASCII.
PT.1CH: for print images and COMDECKS.

In easy cases, the DES.BU routine uses an EIS move (with translate if required) to put the substring into the record image. It advances the working tally (.INREC) by the appropriate amount and calls REC.PT if the tally runs out. For print images and COMDECKS, the PD.1CH passes each string character individually to CHA.PT to facilitate overflow processing, and blank and slew optimization.

6.5 End of File

Functions like CLOSE and EOF need to be able to output an end-of-file to an output stream. To do this, they call EOF.FI from the IOV, which is one of the following functions:

PT.EOF: for disk files. PT.EOF begins by placing a logical EOF (0170000) in the current logical block. It then calls PT.CUR to flush the physical block.
TP.EOF: for tape files. TP.EOF calls BLK.PT, then writes a physical end file mark and appropriate trailer labels.

6.6 COMDECKS

The COMDECK writer, PC.CDK, performs some special processing. PC.CDK is invoked when you attempt to write Media Code 1 using character stream output functions. Since COMDECK records have a fixed size, and Media Code 1 is also the normal media code to use for variable length binary, special processing is required to reconcile fixed lengths within a variable format.

First, PT.MC1 loads REC.PT to do normal variable length record processing. When the first character is passed to PC.CDK, it starts a COMDECK and replaces the current CHA.PT with a pointer to a routine inside PC.CDK. It also replaces REC.PT with a routine that:

wraps up the COMDECK;
resets CHA.PT to PC.CDK; and
resets REC.PT to PT.REC.

As a result, once a COMDECK is started, it is terminated by the next call to REC.PT. Normally this happens because of a call to CLOSE, but it can also happen because of a call to PUTREC, PUTBIN, or SET.MC.

7. Seeking Operations

This chapter looks at the functions performed in "seek" and "tell" operations: jumping to a different position in the file, and determining your current position after a "seek".

7.1 The "fpos_t" Structure Type

The C fpos_t structure type is used to represent a position inside a file, for use with the fgetpos and fsetpos functions. An fpos_t structure is four machine words long, with the following components:

FP.FIL

is the file number. This is used when a file stream is actually a concatenated list of files, as in

		file1+file2+file3

FP.FIL tells whether you are looking at the first file, the second file, etc.

FP.BLK

is the block number within the file.

FP.REC

is the record number within the block.

FP.CHR

is normally the character offset within the record. However, if FP.REC has the special value -1, FP.CHR is the character offset within the block rather than within the record.

The symbol FP.SZ has the value of the size of an fpos_t structure, represented as a B "size".

The two low level routines that work with fpos_t structures are ".FGPOS" and ".FSPOS".

.FGPOS: is called with a unit number and pointer pointing to an fpos_t structure. .FGPOS fills in the structure with information representing the current read/write position on that unit.
.FSPOS: is called with a unit number and a pointer to an fpos_t structure, as created by .FGPOS. .FSPOS changes the current read/write position to the position specified by the fpos_t structure.

If you are using a concatenated list of files, the current version of the library only lets you execute .FSPOS on the first file in the list. In other words, the file number FP.FIL must be 1. You may not seek to another file, and you may not seek back to the first file if you have moved on to a different file.

If you call .FSPOS with the value -1 in the FP.REC entry, .FSPOS interprets the FP.CHR value as a character offset from the beginning of the block, rather than a character offset within a particular record of the block.

7.2 The fseek and ftell Functions

The C functions fseek and ftell are older, UNIX-inspired functions that use a single integer to represent a position in a file. For the purposes of this manual, we will call such an integer a seek integer.

The format of a seek integer depends on the type of file on which you intend to seek.

For most files, a seek integer has two parts. The upper 14 bits represent a block number within a file; the lower 22 bits represent a character offset within that block. The character offset is calculated by checking the RCWs of each record, and assuming that each record before the specified offset contains the maximum number of characters allowed for by the RCW. Suppose, for example, that the RCW of the first record shows 25 words of data; then that record would be assumed to contain 100 characters (4*25) even though the last word in the record might not be full. In this way, the library can calculate character offsets by jumping from record to record, rather than being forced to count how many characters are actually contained by each record.
For a file containing only Media 4 records, the user can request a different seek integer format. In this alternate format, a seek integer is just a number giving the offset of a character position from the beginning of the file (index 0). Therefore a seek integer of 1000 indicates the 1001st character in the file.

In both cases, seek integers may not allow enough range to address every character in a file, if the file is big enough.

Both types of seek integers can be translated directly into fpos_t structures. However, there is no easy way to translate an fpos_t structure into a seek integer, and indeed, it may be impossible to translate some fpos_t structures into seek integers.

The current implementation of fseek simply translates the seek integer into an fpos_t structure and calls .FSPOS.

7.3 The Seeking Process

The sections that follow describe the steps required to seek to a new file position.

7.3.1 Step 1: Get the Logical Block into Memory

.FSPOS begins by calling a function named ".LBSEEK" (which is actually truncated by the linker to ".LBSEE"). It's the job of .LBSEEK to make sure that the desired logical block is present in memory.

For output units, .LBSEEK must flush any pending output before changing positions within the stream. Thus .LBSEEK calls the appropriate routines from the IOV to finish off the current output record, flush it to the stream, and write an end-of-file mark, if appropriate. For input units, no special processing is required.

Next, .LBSEEK checks to see if the desired logical block is already in memory. If so, .LBSEEK simply adjusts the relevant IOV pointers to point to the block and returns.

If the logical block is not in memory, .LBSEEK must read in the specific piece of the file that contains the block. It does this by calling GT.SPC. After GT.SPC returns, .LBSEEK adjusts the IOV pointers to point to the appropriate logical block within the data just read.

7.3.2 Step 2: Adjustments for Partitioned Records

.FSPOS must next check to see if the desired logical block is in the middle of a partitioned record. If so, .FSPOS must find the start of the record to determine the media code of the record.

To find the start of the record, .FSPOS calls .LBSEEK repeatedly to walk backwards through the file. The partition number of the current block indicates the number of blocks to walk backwards (modulo 1024). Once the search reaches the start of the record, .FSPOS makes note of the media code, then calls .LBSEEK once more to get the desired logical block into memory.

7.3.3 Step 3: Reading Past the High Water Mark

The .LBSEEK function only works up to the high water mark of the file, as specified by HI.WAT in the IOV. If the desired seek position lies farther on, .LBSEEK only goes as far as the high water mark. From that point onward, .FSPOS reads through subsequent records one at a time (using REC.GT) until it reaches the desired logical block.

This process is necessary because the library does not automatically determine the end-of-file position. The library knows that the file goes as far as the high water mark, but it doesn't know how much farther the file extends. Thus .FSPOS must read one record at a time to make sure that none of those records represents end-of-file and that the file has no invalid contents.

If .FSPOS reaches end-of-file before reaching the desired seek position, there are two cases:

On most files, this situation is considered an error.
On files whose contents were created as C "binary format" files, the library uses null bytes to pad the file out to the specified position.

7.3.4 Step 4: Finding the Right Record

At this point, .FSPOS has finally managed to ensure that the correct logical block is in memory. It now uses REC.GT to count through the records in the block until it finds the desired record (specified by FP.REC in the fpos_t structure).

7.3.5 Step 5: Entering an Ambiguous State

At this point, .FSPOS has identified the record containing the target file position. However, .FSPOS can't directly move to the desired character position. The reason is that the library can't tell whether the next operation on the unit will be a read or a write; therefore the library doesn't know whether to set up the IOV for reading or writing.

As a result, the library stores the value of the character offset FP.CHR in the IOV and changes various IOV entries to point to specialized functions:

		  CHA.GT  points to  GC.AMB
		  CHA.PT  points to  PC.AMB
		  DES.BU  points to  PD.AMB

All of the ".AMB" functions are specialized ones to handle the ambiguous state of the IOV. (Note that it is not necessary to change the STR.GT function pointer, since GETSTR performs at least one CHA.GT to make sure a proper record has been set up and that the correct STR.GT function has been initialized.)

7.3.6 Step 6: Resolving the Ambiguous State

The IOV remains in its ambiguous state until the program attempts a read or write operation on the unit, thereby activating one of the ".AMB" functions in the IOV.

If the program activates the reading function GC.AMB, the function uses CHG.RD to change the IOV to read mode. Then it retrieves the stored character offset and reads that many characters into the current record. Next, the file is positioned to read the appropriate character. The function the restores the previous versions of CHA.GT, CHA.PT, and DES.BU and proceeds with the requested read operation.

If the program activates one of the writing functions (PC.AMB or PD.AMB), the situation is more complicated. First, the activated function retrieves the stored character offset. Next it allocates a buffer and reads characters from the file into the allocated buffer, up to character before the designated offset. At this point, the allocated buffer contains the partial record up to seek position. The activated function then calls CHG.WR to change the IOV to write mode, and uses DES.BU to output the partial record from the buffer to the logical block.

At the end of all this, the IOV is in write mode and is ready to write to the specified seek position. The activated function restores the previous versions of CHA.GT, CHA.PT, and DES.BU, and proceeds with the requested write operation.

7.3.7 Notes on Seek Operations

The CH.NL entry of the IOV is set to -1 after a seek operation, since the previous character count is not known. If you then attempt an UNGETC followed by an .FGPOS to determine the position, you will get an error except if you are at EOF.

If you are at EOF and CH.NL is -1, .FGPOS must determine the character position by looking at the record just before the EOF. Therefore .FGPOS looks at the previous record and counts characters to determine the character offset.

Combining seek operations with ungetc can confuse .FGPOS. For example, if you seek to the beginning of a record, then ungetc a character and call fgetpos, the result is usually an error condition, since .FGPOS can't figure out where it is.

7.4 Rewind Operations

Rewind operations are similar to seeking to the beginning of the file. However, there is one exception. Instead of putting ".AMB" functions into the IOV, the rewind operation puts in special functions named GC.REW, PC.REW, and so on. These functions simply reinitialize the IOV for reading or writing, depending on the nature of the next I/O call.

8. Error Handling

This chapter looks at library facilities for error handling and recovery from errors.

8.1 Error Tables

An error number is a 36-bit quantity representing a type of error recognized by the library. The upper 18 bits of the error number is an index into a list of error tables ; the lower 18 bits of the error number refers to an error in that table.

An error table is a GMAP vector used in issuing error messages. The .STRER function maintains a list of pointers to the error tables defined by the library, so that each table may be referred to by an index into this list.

The first word of the table is a pointer to a function which can "decode" the rest of the table. In order to issue an error message from the table, you invoke this function, passing it a pointer to the table and an error number, indicating which message you want to issue. It's the job of the function to use the bottom 18 bits of the error number to find the appropriate message within the table. Note that different tables may have different formats and therefore different functions to work with those formats.

The most common table format is identified by the name ".ER.TA". It has the form:

First word:: Transfer instruction to handler function.
Second word:: A pointer to a table of printf-style strings. These strings form the basis of the error messages, although the function referenced by the first word (above) may use these strings in a variety of ways.
Third word:: An integer telling how many strings are in the table.

At present, the library uses three error tables with this format:

.E.FMS: for errors detected by the FMS system (for example, the user doesn't have permission to access a requested file).
.E.DIO: for I/O errors detected by other GCOS8 system software (for example, hardware problems with the I/O device).
.E.LIB: for errors detected by the UW Tools library.

Programmers writing new library routines may create their own .ER.TA error tables or may create new error table formats. To add these to the library, you have to add your table to a list that appears in the library's .STRER function. .STRER assigns an index value to each such table, and each table gets its own range of error numbers.

Since the value of an error number tells both the error table index and the number of an error in that error table, .STRER can locate an error message just from the error number.

8.2 Posting Error Messages

When a library function detects an error, it calls a function named .EPOST to post the error message. The posting process doesn't display the message directly to the user; it simply stores the message so that it can be retrieved later. Thus library functions may post an error message at any time during the error handling and recovery process, presumably at a point when the library can provide the user with the maximum amount of useful information about the error. The posted message can then be output to the user later on, if and when the recovery process determines that the message should be displayed.

The .EPOST function takes at least two arguments:

An error number (the full 36 bits).
A pointer to the format string. If this has the format of a B pointer (zero in the upper halfword, pointer value in the lower), it is assumed to point to a string in the format of B's printf. If it has the format of a C pointer (non-zero value in the upper halfword), it is assumed to point to a string in the format of C's printf.

In addition, .EPOST takes a variable length list of argument values corresponding to any placeholders in the format string.

.EPOST calls printf to format the message, then associates the formatted message with the given error number. In this way, the message is posted in connection with the error number.

.EPOST only keeps one posted message at a time. The next call to .EPOST overwrites the previous posted message, whether the new call specifies the same error number or a different one. Thus the posted error message is always the most recent one generated by the library.

8.3 C's "errno" Facilities

The ANSI standard for C requires implementations to support a simple errno concept. A non-zero value assigned to the variable errno indicates an error condition.

In general, library functions should not assign error numbers directly to errno. Instead, they should call .EPOST. .EPOST automatically assigns the value of its error number argument to errno.

B source code can refer to the errno variable using the name ".ERRNO". This is just another name for the same data object.

The problem with C's errno is that there is only one for the whole program. Asynchronous conditions (e.g. the user pressing BREAK) can overwrite a different value on top of the errno generated by an I/O error. Therefore, you can't really depend on the current value of errno actually reflecting the error that originally triggered the error condition. It is best to use functions like .EPOST instead, which keep track of a separate error number for each IOV and which do not get overwritten during asynchronous interrupts. When writing your own functions, you can also indicate errors by passing back "error status" return values rather than assigning a value to errno.

8.4 Default Messages

In addition to the posted error message, .EPOST maintains a list of default error messages for each error number. If you call .EPOST without a format string (just an error number), .EPOST gets rid of the posted error message and substitutes the default message.

The default messages may be more general than specifically posted error messages; they cannot provide the sort of specific information you can put into a posted message. However, the default message is often enough to tell the user what went wrong.

8.5 Retrieving a Posted Message

To obtain the posted message associated with a particular error number in a C program, use the __STRERROR function (with two underscores at the beginning). This function takes an error number as its argument and returns the "best" error message associated with that error number. For example,

		  error(__strerror(number));

uses __STRERROR to obtain the "best" error message associated with that error number. If the error number matches the error number of the currently posted message, __STRERROR returns the posted message; otherwise, it returns the default message associated with the error number.

B programs may invoke __STRERROR under the name ".STRER". Both names refer to the same function.

In addition to __STRERROR, there are two other functions which obtain error messages

STRERROR: is an ANSI standard C function which returns the default error message associated with a particular error number.
_STRERROR: (with a single leading underscore) does the same thing as STRERROR.

Since STRERROR is recognized by the ANSI standard, it should be used in programs that require maximum portability. However, __STRERROR (with two leading underscores) returns the posted error message whenever appropriate, and therefore provides better quality error messages overall.

8.6 The IO.ERR Function

When an I/O function encounters an error on a stream, it calls the IO.ERR function to deal with that error. IO.ERR is invoked with the following arguments:

A unit number, indicating an IOV.
An error number, telling what kind of error occurred.
A string in the format accepted by .EPOST, to serve as the posted error message.

If the flags in the IOV indicate that the program should terminate on the encountered I/O error, IO.ERR calls .EPOST to post the error message. IO.ERR then outputs the message to the standard error stream and terminates the program.

If the flags in the IOV indicate that the program would like to attempt recovery from the error, IO.ERR begins by putting the indicated IOV into an error state. It does this by copying all the function pointers from the IOV to a save area referenced by the ERR.SV entry in the IOV. IO.ERR then replaces all these function pointers with pointers to error functions. The error functions simply return an end-of-file indication whenever they are called. In this way, any I/O operation executed on an IOV in an error state simply results in EOF.

After putting the IOV into an error state, IO.ERR calls .EPOST to post the error message, then returns to the user program. The user program may identify the type of error encountered on the unit with the function call

		  errnum = f.err(unit);

To issue an error message, the program may call

		  io.err(unit);

specifying the unit where the error occurred. Note that in this call to IO.ERR, you do not specify an error number or a message.

The format of a diagnostic message issued by IO.ERR is

		  file: location: message

as in

		  user/myfile: llink 10: invalid checksum

8.6.1 Clearing an Error

When the user program wishes to clear the error state of an IOV, it calls a standard function (like C's clrerror), which in turn invokes the CLR.ER function entry in the IOV. The default version of CLR.ER simply restores the saved function pointers from ERR.SV, to put the IOV back into a normal state. For some types of errors, CLR.ER may point to a different function: one that does special processing to recover from the error, before restoring the function pointers.

9. Auxiliary Functions

This chapter looks at various auxiliary functions associated with I/O. For further information on any of these functions, see the explanations under "expl b lib".

9.1 Flushing a Stream

The FLUSH function calls FLS.RD for input units and FLS.WR for output units.

FLS.RD uses successive calls to CHA.GT to fetch characters up to and including the next new-line.

FLS.WR causes a logical end-of-record on the output unit by calling PT.REC.

9.2 Backing Up a Stream

The BACKSP function backs up the working tally .INREC by one character (if possible).

9.3 The UNGET Operation

The UNGETC function saves the current value of CHA.GT in the IOV under the name of "UNGT.F". It also keeps a list of all the characters that have been pushed back into the stream with UNGETC; the UNGT.C IOV entry points to this list. UNGETC then changes CHA.GT to point to a routine that obtains characters from the UNGT.C list rather than from actual input, and changes STR.GT to point to GS.1CH so that "get string" operations get characters one at a time through the special CHA.GT.

When the special CHA.GT reaches the end of the list of UNGT.C characters, it restores the old CHA.GT from UNGT.F. The STR.GT function is left as GS.1CH until the next record is obtained.

READ and WRITE: call RAW.IO to perform the I/O. RAW.IO just remaps its arguments and calls RAW.RD or RAW.WR to perform the I/O. These functions affect SEC.NO but not B.SEEK.
GETRCP: uses .INBLK as the pointer to the current record, and calls REC.GT to advance pointers to the next record.
GETREC: reconstructs a whole record in the user's buffer. It calls GETRCP to read the partitions of the record.
GETBIN: is almost the same as GETREC, except that it doesn't return the RCW in the caller's buffer.
PUTRECand PUTBIN: call REC.PT to terminate the record currently being built, if any. Then they use .INBLK to put the user's record into the buffer. If necessary, they partition the record. Finally, they call REC.PT to build new tallies to the remainder of the block.

9.8 Other Functions Using the IOV

The following functions all refer to the IOV in some way.

EOF: tests for end-of-file on input units. First EOF checks to see if UNGETC has stored any characters in the UNGT.C list; if any such characters exist, EOF returns a value indicating that the file has not reached end- of-file. If there are no UNGETC characters, EOF just returns the value of the flag EOF.FL.
For B output units, EOF writes an end-of-file, by calling REC.PT to terminate the last record and EOF.FI to write the end-of-file. For C output units, EOF does nothing, since the ANSI standard for C only recognizes EOF queries on input files.
BACK.D: performs the system call to pass a file to SYSOUT. It then calls .LEAVE to the leave the file accessed in the AFT, and calls CLOSE to tear down the IOV.
FILDES: simply returns the IOV entry FL.DES.
GETMED and SET.MC and SET.RC: refer to the IOV entry MED.CD.
F.SIZE: returns the file size in llinks. If the FNV.FL bit is on, F.SIZE returns the negative of the file size, indicating that the value is only an estimate of the file's true size.

B I/O Program Logic Manual

Contents