The B Tutorial Guide: Contents

1. Introduction
2. Using B
3. Errors
4. The Layout of a B Program
5. The "main" Function
6. Constants
    6.1  Decimal Integers:
    6.2  Octal Integers:
    6.3  Floating Point Numbers:
    6.4  ASCII Character Constants:
    6.5  BCD Character Constants:
    6.6  String Constants:
7. Simple Variables
    7.1  Auto Variables:
    7.2  External Variables:
8. Simple I/O
    8.1  PRINTF:
    8.2  GETCHAR and PUTCHAR:
9. Simple Expressions
    9.1  Arithmetic:
    9.2  Relational Operators:
    9.3  Logical Operators:
    9.4  Assignments:
10. The If Statement
11. Looping
    11.1  The While Statement:
    11.2  Labels and the goto Statement:
    11.3  Other Looping Statements:
13. Vectors
    13.1  Subscripting:
    13.2  External Vectors:
    13.3  Auto Vectors:
    13.4  Multi-Subscripting:
    13.5  GETVEC and RLSEVEC:
14. More Operators
    14.1  Increments and Decrements:
    14.2  Bitwise Operators:
    14.3  Bit-shifting Operators:
    14.4  The Query Expression:
    14.5  Addressing and Indirection:
15. Escape Sequences
16. Manifest Constants
17. The Switch Statement
    17.1  Break and Next:
18. More on I/O
    18.1  Redirection of I/O:
    18.2  Multiple Units:
    18.3  Closing Files:
    18.4  .READ and .WRITE:
    18.5  Special Units:
    18.6  File Access Conventions:
    18.7  String I/O:
19. More about Functions
    19.1  Arguments:
    19.2  Recursion:
    19.3  Limitations on Function Calls:
20. Program Command Lines
    20.1  Arguments for "main":
    20.2  The Options Table:
    20.3  The Tab Expansion Program:
21. BOFF
    21.1  Examining Dumps:
    21.2  Examining Memory Locations:
    21.3  Tracing a Program:
Appendix A: Binding Strength of Operators
Appendix B: B Compiler Error Messages
Appendix C: Escape Sequences
Appendix D: Partial Index of B Library Routines

1. Introduction

This tutorial guide is designed to provide a working knowledge of the B programming language for those who have never used B before. We've included a variety of sample programs here: numerical programs, string manipulation programs, system-oriented programs, and so on. Naturally though, the best way to learn B is to write programs of your own. At times we will suggest some programs you might want to write, but it's usually better if you choose your own applications.

You should be warned that many of the programming examples which we present here will not show the optimal way of writing up a given program. For example, many of our sample programs perform operations which could be done with a single call to one of the B library functions. Program examples in the earlier sections of this tutorial are often written in a less than optimal way because the reader has not yet been introduced to the more elegant concepts of B. Most of our sample programs are intended to be simple examples of how specific constructs may be used in B; while we will never stoop to bad programming style, you should always bear in mind that our programs are written for instructional purposes, not stream-lined production runs.

In this tutorial, we'll assume that you know at least one programming language already (e.g. Fortran, Pascal, or PL/I). We'll also assume that you're familiar with such things as creating files and text editing. Lastly, we'll assume that you know enough about machine architecture to know the difference between a bit, a byte, and a word.

Because this is a tutorial, we will not cover all the features of B, although we will try to mention most of them. For full details about the B language, you should get a copy of the B Reference Manual. Another useful source of information is the group of "explain" files for B. The most important of these are the files describing the B run-time library functions. To get a description of any function, see "expl b lib <name>" where <name> is the name of the function you want an explanation of (e.g. "expl b lib printf").

We might point out right here at the beginning of the tutorial that B uses a number of characters which are not available on some terminals (e.g. '{', '}', '|'). If your terminal does not have some of these characters, B will accept special escape sequences to represent the characters; for example, you may use the sequence '$(' to stand for '{'. A complete list of these escape sequences is given in Appendix C of this tutorial.

2. Using B

B is especially suited to applications which involve non-numeric computations. Such programs can involve logical decisions, operations on integers and character strings, and low level bit manipulation. Although B can do floating point operations, it does not do them particularly efficiently; thus it might be better to use a different language if you are interested in doing a lot of floating point work.

The usual way to prepare the source code for a B program is to use a text editor like FRED. To help with the readability of your program, B allows you to break statements up into any number of lines and it allows you to indent those lines freely. In fact, B treats new-line and tab characters exactly like spaces, and it allows you to use any number of spaces in place of a single space.

Once you have prepared your source code, you can compile it using the "b" system command. Thus if your source is in the file "mystuff", you would compile it by saying

b mystuff

This command compiles your source and puts the resulting object code into a file called ".h". (If there is no permanent file called ".h" under your current catalog, B will create a temporary file.) The B compiler does .ul not generate a source listing during compilation; if you want a listing of your program, you should use the system command SLIST.

To execute your program once it's been compiled, you use the system command


where "filename" is the name of the file containing the core image file prepared by the B compiler. If "filename" is not specified, "go" assumes the default file name ".h".

3. Errors

Every programmer knows that programs hardly ever compile perfectly the first time. By default, the B compiler sends all its error diagnostics to the terminal. This has the advantage of showing you your errors right away, but if you are working at a CRT terminal there's always the chance that you will lose some of your diagnostics off the top of the screen. Thus many people prefer to redirect their errors from the terminal to a temporary or permanent file. To do this, you compile your source with a command of the form

b mystuff >errs

The above command compiles source code from the file "mystuff" and sends error messages to the file "errs". If there are no errors, the usual ".h" object deck is created so that you can execute your program with "go".

The typical B error message tells the type of error that was encountered and the number of the line where B realized that an error had occurred. Unfortunately, B sometimes doesn't realize that something has gone wrong until several lines after the line that actually contained the mistake. Thus if the line which is specified in an error diagnostic looks all right to you, it's a good idea to go back a bit and check previous lines to make sure they don't have errors.

One of the easiest errors to make in B is leaving off the closing brace bracket ("}") at the end of a compound statement. Usually B won't realize that you've done this until the last line of the function that contains the compound statement. Thus if you get a number of error messages about the last line of a function, the chances are that there is a "}" missing somewhere in the function.

One error can generate a lot of error messages. When there are a number of diagnostics for a single line of code, the first one is usually the only important one. The other messages often just indicate that the first error put the compiler into such a state of confusion that it misinterpreted everything else on the line.

B's error diagnostics are usually self-explanatory. If you need more help to understand the errors in your program, Appendix B of this tutorial discusses most of the compiler's diagnostics in detail.

4. The Layout of a B Program

All B programs are made up of one or more "functions". These functions are like the functions and subroutines of a Fortran program or the procedures of PL/I. Any function can invoke any other function; it can also call itself recursively. When a function is called, the caller can pass zero or more arguments to the called function. Thus the process of calling functions in B is much like the corresponding processes in other programming languages. However, there are a number of important differences that will become apparent in later sections.

One of the nicest features of B is that there is a vast collection of functions already written for the user and stored in the B function library. Some of these functions are written in B and some are written in assembler for efficiency reasons. Most B programs use these functions quite a lot; for one thing, all I/O operations are performed with library functions. Obviously, you should get to know the B library functions well if you want to use B to its full potential.

In the text of this tutorial, we will always refer to library functions by putting their names in upper case (e.g PRINTF). User-written functions will be in lower case and enclosed in double quotes (e.g. "main"). These conventions are simply for purposes of identification in this tutorial; B itself doesn't pay attention to the case of alphabetic characters, so you can type letters in whatever case you prefer. Most users type all function names in lower case, and that will be the convention we use in our programming examples.

Below we give a rough outline of how functions are laid out in a B program. In later sections, we'll explain things in more detail.

main() {
   --- statements ---
func1( arg1, arg2 ) {
   --- statements ---
anotherfunc( arg ) {
   --- more statements ---

There are three functions in the program above: "func1", "anotherfunc", and the special function "main" which is discussed in the next section. Each function starts with a line that contains the name of the function, followed by the function's arguments in parentheses.

The statements of each function are enclosed in brace brackets ("{}") in order to group them together. Grouping statements with braces is common practice in B; a collection of statements enclosed in braces is known as a "compound statement". It will be a great help in organizing your program (and in avoiding syntactical errors) if you adopt a clear and consistent style for using braces. In this tutorial, we will use one popular style, but there are others. The important thing is to have a style. Throwing around braces in a haphazard way is sloppy programming, and leads to all kinds of problems in reading and debugging.

There is one last point we would like to make about the general structure of B programs. Because you can spread statements over any number of lines without using a continuation character, the compiler must have some way of knowing where one statement ends and the next begins. Thus, simple statements in B always end with a semicolon ';'. Compound statements always begin with a '{' and end with a '}'. There is no ';' after the '}' which marks the end of a compound statement.

5. The "main" Function

Every working B program has a function called "main". Execution of the program always begins with the first statement of "main". Program execution ends with the last statement of "main", unless the program has been terminated earlier with a function like EXIT or .ABORT.

Apart from these distinctions, "main" is exactly like any other function in the program. It can call other functions and can be called by other functions. "main" can appear anywhere in the source code, though most programmers prefer to put it at the start of the source where it's most visible.

6. Constants

In this section we discuss the kinds of constants which can be used in B programs.

6.1 Decimal Integers:

Standard decimal integers are given with no decimal point and no leading zeroes. Below are some examples.

3   7   254000   678

6.2 Octal Integers:

Standard octal integers are given with at least one leading zero. Decimal points are not permitted. Octal integers can only use the digits from zero to seven. Below are some examples.

010   0777777   004567   00000

6.3 Floating Point Numbers:

Any number containing a decimal point is a floating point number. The decimal point cannot be the first character of the number; leading zeroes are allowed. Floating point numbers may include the letter 'e' and a decimal exponent. The exponent may have a "+" or "-" sign. Below are some examples.

7.6   3.   0.2   1.e6   5.982e-4

6.4 ASCII Character Constants:

ASCII character constants consist of one to four ASCII characters enclosed in single quotes. These are stored as nine-bit characters in a single machine word of 36 bits. If the constant has less than four characters, the characters are right-justified in the machine word and padded on the left with zero bits. Below are some examples.

'a'   'abc'  '123d'  'x y '

6.5 BCD Character Constants:

BCD character constants consist of one to six BCD characters enclosed in grave accents. These are stored as six-bit characters in a single machine word of 36 bits. If the constant has less than six characters, the characters are right-justified in the machine word and padded on the left with zero bits. Below are some examples.

`xyz`   `abc de`   `s123x`   `k`

6.6 String Constants:

A string constant is any sequence of ASCII characters enclosed in double quotes. Below are some examples.

"this is a string"
"hi mom"

When processing a string, B packs four ASCII characters per machine word. The compiler marks the end of the string by appending one extra character, an ASCII null (octal 000). Thus the string "abcd" would require two words of storage: one word for the 'abcd' and another word for the 000 that marks "end-of-string".

The value of other B constants (decimal, octal, floating point, ASCII, and BCD) is a single machine word containing the internal representation of the given constant. Obviously, the same cannot be true of string constants, since most strings are too long to fit into a single word. For this reason, the value of a string constant is a single machine word containing the address of the actual string in memory. This is an important thing to bear in mind as you are programming. If you were to say

a = "this string";

in a B program, the variable "a" would receive the address in memory where "this string" is stored -- "a" does not receive the string itself. Thus if you printed "a" without being careful, you would be printing an essentially meaningless address, not the string you wanted.

We will have a good deal more to say about strings in later sections.

7. Simple Variables

Variable names in B can be as long as you want, but the compiler only pays attention to the first eight characters. Thus the compiler sees no difference between the names "variable1" and "variable2". Obviously then, names longer than eight characters are for the programmer's convenience only; from the compiler's point of view the first eight characters are the only ones that count.

The names of functions and of external variables (see Section 7.2) are slight exceptions to this rule. These names are used by the loader as well as the compiler, and although the compiler looks at the first eight characters, the loader only looks at the first six. Thus you should make sure that functions and external variables have names that differ in the first six characters.

Identifier names in B can contain any alphabetic character (upper or lower case), any numeric character, and the special characters dot (.) and underscore (_). Names cannot begin with a digit. It is recommended that programmers avoid using names that contain the dot ".". By convention, this character is used to distinguish the variables and functions of the B library routines. If your program uses names containing the dot character, it's possible that naming conflicts will occur and the library routines will begin to act strangely.

The most important thing about variables in B is that they are typeless. In Fortran, variables have a specific type like "real", "integer", "double precision", and so on. Many other programming languages share the notion that variables refer to a specific data "type". This is not true in B. For example, it is perfectly legitimate in B to say

var = 'a';
var = var + 1;

B doesn't care that a variable is treated like an ASCII character one minute and like an integer the next. If you executed the statements above, the variable "var" would end up with the value 'b', since the ASCII representation of 'b' is one more than the ASCII representation of 'a'. Since an ASCII 'b' is octal 0142, "var" could also be considered to contain the octal constant 0142 which is equivalent to the decimal constant 98 which is equivalent to the floating point number 7.30156898e-7, and so on. As far as B is concerned, all of these interpretations are equally valid.

Thus you can perform floating point arithmetic with BCD character constants in B, and the compiler will not complain a bit. Naturally, you can't expect to get meaningful results, but you won't get error messages either. Most users find this freedom to be one of the most significant advantages of B over other languages. Basically, a variable is just a word in memory and B doesn't concern itself with the interpretation of that word's contents.

In some ways, calling B a typeless language is a little misleading: after all, data types like "integer", "real", "character" and so on do exist in B as was seen in Section 6. The point is that it's up to the programmer to make sure that variables have the correct "type", because the compiler doesn't do this kind of babysitting. B doesn't care if the variable "x" contains an integer or the address of an ASCII string. However, if you take this same sort of attitude, your programs are likely not going to work. Obviously, you have to have a pretty good idea of what your variables have in them or else you'll find yourself in trouble.

7.1 Auto Variables:

Auto variables are variables which are only known within a single function. In other programming languages, such variables are known as "local" variables.

Auto variables must be declared in the function that uses them. There are two places this can be done. The first place is in the argument list which follows the function name in the first line of a function (see Section 4). Thus all of a function's arguments are auto variables. The second place is in an auto statement. The function below uses both of these types of auto variable declaration.

func( a ) {
    auto b, c, x;
    c = 3;
    b = 7;
    x = (a + b) * c ;

The function "func" above takes one argument "a" and has three other auto variables "b", "c", and "x". After doing the given calculation, "func" returns the value of "x". (This is the purpose of the return statement in the last line of "func".) Thus "func(1)" would have the value 24.

An auto statement can declare the names of as many auto variables as desired (e.g. the auto statement in the above function declared three auto variables). However, a program is much more readable if auto statements are kept down to a reasonable length (say, half a dozen variable declarations per statement). A function can have as many auto statements as needed or desired, but they should all come before any of the executable statements of the function. The reasons for this are given in the B Reference Manual.

There is one last point to be stressed about auto variables. The auto variables of a function come into existence when the function is called, and they disappear when the function returns. One consequence of this is that auto variables cannot be initialized at the beginning of execution, because they really don't exist at the beginning of execution. Auto variables which are the arguments of a function receive values when the function is called; all the other auto variables in the function contain garbage when the function is invoked, and therefore they must be explicitly assigned values before they can be used.

7.2 External Variables:

External variables are variables which are available to all the functions of a program. In other programming languages, such variables are known as global variables.

Any function wishing to use a particular external variable must state its intentions in an extrn statement. As with auto statements, extrn statements should be placed at the very beginning of the function in which they appear.

Below we show a function that uses external variables to generate random integers.

randnm() {
    extrn lastrand, scramble;
    lastrand = lastrand * scramble;
lastrand {1} ;
scramble {30517578125};

This algorithm for generating random integers was taken from a paper by R.R.Coveyou and R.D.MacPherson, "Fourier Analysis of Uniform Random Number Generators", JACM, Vol.14, No.1, pp.100-119, January 1967.

There are several things you should notice about the above example. The first is that the external variables are specified outside the body of "randnm". This is one of the reasons for the name "external": external variables must be specified outside the body of any function. They can be specified between any two functions if you wish, but it is more standard programming style to list all the external variables at the beginning or end of the program, where they'll be more visible.

The second thing you should notice is that the external variables are followed by numbers in brace brackets when they are specified outside the function body. These numbers are initialization values. Thus "lastrand" is initialized to 1 and "scramble" is initialized to 30517578125 before execution of the program begins. (For those of you who were wondering, the value of "scramble" is really just 5**15. If we had wanted to, we could have specified the initialization value as 5*5*5*...*5, since B will accept constant expressions for initialization values as well as simple constants. We could NOT have specified the initialization value as 5**15, since the exponentiation operator "**" is not supported in B).

External variables can be initialized to any kind of value you want. For example, you can say

a {"a string"};
b {'d' + 5};

and so on. The first statement above initializes "a" to contain the address where the ASCII string "a string" is stored in memory. The second statement initializes "b" to the ASCII character 'i'. The only thing you must remember is that initialization values can only be constant expressions. You can't initialize something to "i+2" where "i" is another variable.

If you don't want to initialize an external variable, just leave out the quantity in brace brackets, as in

xyz ;

The above defines an external variable "xyz" which will contain garbage until it is assigned a value during the course of execution.

The last thing we'd like to point out is that there are TWO ways that another function could get at the random numbers which "randnm" generates. The first way is to obtain the random number from the external variable "lastrand". When execution begins, "lastrand" has the value 1. Successive calls to "randnm" generate a sequence of pseudo-random numbers in the variable "lastrand". Any function can get the current random value of "lastrand", provided it has declared "lastrand" in an extrn statement.

The second way a function can obtain a random value from "randnm" is to make use of the fact that the return statement in "randnm" returns the new random value. Just as "func(1)" in the last section stood for the value returned by the return statement in "func",

x = randnm();

will call "randnm", obtain the random value which "randnm" returns, and assign that value to "x". During the call to "randnm" which the above statement makes, the external variable "lastrand" will receive the new random number as well. Thus

x = randnm();
y = lastrand;

will give "x" and "y" the same value, while

x = randnm();
y = randnm();

will give "x" and "y" two different random numbers.

Note in passing that the parentheses after "randnm" are necessary whenever the function is called, even though there are no arguments being passed. The parentheses tell the compiler that "randnm" is the name of a function and not the name of an ordinary variable.

8. Simple I/O

As we have mentioned before, all I/O in B is performed by calling library functions. There are many such functions for various specialized I/O purposes. In this section, we will discuss the I/O functions you will likely use the most.


PRINTF is an all-purpose output function that prints a number of values according to a format string. Format strings can include some fairly complicated constructs, but in this tutorial we will only talk about the basics.

The five simplest format strings are

"%c" -- ASCII character
"%d" -- decimal number
"%h" -- BCD character
"%o" -- octal number
"%s" -- ASCII string

Thus a command like


would print the value of the variable "x" as an octal number on the current output unit. (When your program begins execution, the current output unit is the terminal. We'll discuss ways to redirect the destination of your output in Section 18.1.)

When a format string contains characters which aren't part of a recognized format like the five above, the characters are simply copied without change to the current output unit. Thus if you said,

x = 12;
printf( "The value of x is %d.", x );

your output would be

The value of x is 12.

The normal characters are simply copied out, while the value of "x" is substituted for the "%d".

PRINTF will print out any number of variables or expressions. Below are some examples.

printf("The sum of %d and %d is %d.",5,7,12);
  prints: The sum of 5 and 7 is 12.
printf("The octal equivalent of '%c' is %o.",'a','a');
  prints: The octal equivalent of 'a' is 141.
printf("%d + %o equals %s or %c.",1,-1,"zero",'0');
  prints: 1 + 777777777777 equals zero or 0.

Printing floating point numbers is a little trickier than the simple cases above. For full details, see "expl b lib printf".

PRINTF sends its output in a continuous stream to the current output unit; it does not put in new-line characters unless you somehow specify them. This is usually done by using the special escape sequence '*n'. B will replace '*n' with a new-line character whenever it appears inside an ASCII string or an ASCII character constant. Thus

printf( "This is a line*nfollowed by another line.");
  prints: This is a line
          followed by another line.

The '*n' in the middle of the format string indicates a standard new-line character whose effect will be to break the output and begin on a new line.

You must always remember to specify your new-lines where you want them. For example,

printf("hi there");
  prints: hi therehello

You likely want to say something like

printf("hi there*n");
  which prints: hi there


The most straightforward library function for reading input is GETCHAR. GETCHAR reads one ASCII character from the current input unit. Like the current output unit, the current input unit is set to the terminal by default at the beginning of program execution. We will discuss redirecting this unit in Section 18.1.

A call to GETCHAR generally has the form

c = getchar();

GETCHAR will read a single ASCII character and return it to the variable "c". This character is put in the lowermost byte (nine bits) of "c"; the rest of "c" is filled with zero bits.

GETCHAR actually reads in one line of input at a time, but it hands out this input one character at a time. When all the characters in the line have been handed out, the next call to GETCHAR will cause another line to be read in. Don't forget that the last character in a line will usually be a '*n' new-line character, indicating that the line is finished.

One of the most common things to do in a program is to print out a line as you read it in. You could do this with the commands

c = getchar();

but this isn't particularly efficient. After all, PRINTF is designed to do very complicated output formatting, not simple character echoing. In cases like this, it is usually better to use the library function PUTCHAR. PUTCHAR outputs the non-zero ASCII characters of a single machine word to the current output unit. Thus

c = getchar();

reads a single ASCII character into "c" and then prints that single character.

There is an even neater way of performing this "read a character and print it" operation. The technique is based on the fact that PUTCHAR returns a function value that is identical to the character it prints. Thus if you were to say

d = putchar(c);

the contents of "c" would be printed out. In addition, the variable "d" would receive the character that PUTCHAR prints, so "d" would be assigned the contents of "c". Thus the above statement has the same effect as

d = c;

Ask yourself what would happen if you wrote

c = putchar( getchar() );

GETCHAR would read in a character and return it to the B program. That character would be the argument that PUTCHAR prints out. "c" would get the character that was printed, when PUTCHAR returns this value. Thus the single statement above reads in the value, prints it out, and assigns the input value to "c".

Some beginners find this function nesting somewhat hard to follow. When you think of it though, it really is no more unusual than typing something like


in Fortran.

9. Simple Expressions

We have already been using some very obvious examples of expressions, e.g. "a + b", "c = d", etc. In this section, we will look at expressions in more detail.

In the sections to come, we will be giving a rough description of the order in which operators are evaluated in expressions. For the exact order of operator evaluation in B, see Appendix A. Parentheses may be used in the standard way to change the order of operation.

9.1 Arithmetic:

B supports the standard integer arithmetic operators '+' (addition), '-' (subtraction), '*' (multiplication), and '/' (division). These are evaluated using the usual order of operation: '*' and '/' before '+' and '-', and otherwise left to right. The unary minus sign '-' is also supported in B, so that "-x" gives the complement of "x".

There is one other integer arithmetic operator that is available in B, namely the "remainder" operator '%'.

a % b;

is the integer remainder obtained when "a" is divided by "b". For the mathematically inclined, "a % b" can be thought of as "a" modulo "b".

Note that the operators listed above perform integer operations. To perform floating point operations, different operators are used. (Since variables in B are typeless, the type of an arithmetic operation must be embodied in the operator.)

Most floating point operators are just the integer operators with a '#' in front of them. Thus '#+' indicates floating point addition, '#-' is floating point subtraction, '#*' is floating point multiplication, and '#/' is floating point division.

The unary operator "#" converts integers to floating point; thus "#3" is 3.0. The unary operator "##" converts floating point numbers to integers; thus "##3.0" is 3.

9.2 Relational Operators:

Relational operators are often used to test conditions for if statements, while statements, and so on. The integer relational operators are listed below.

==  equal to
!=  not equal to
>   greater than
<   less than
>=  greater than or equal to
<=  less than or equal to

All these operators perform integer comparisons. For floating point comparisons, the operators are the same as above, except that they are preceded by a '#'.

All relational expressions have a numerical value. The value is 1 if the relation is true and zero if the relation is false. Thus

a = (b < c);

assigns 1 to "a" if "b" is less than "c", and zero otherwise.

One of the most frequent errors that B beginners make is typing a single "=" when they really want "==". "=" is an assignment operator. Thus if you say something like

if (a = b) ...

you are not testing to see if "a" equals "b"; you are actually assigning "a" the value of "b"! You will not get an error from this -- it is a perfectly valid operation in B -- but quite often, it is not really what you want.

Relational operators are evaluated from left to right after all arithmetic operators have been evaluated. Thus

a + b > c + d

is equivalent to

(a + b) > (c + d)

Parentheses can be used to change the order of operation, so that

a + (b > c) + d

is "a + 1 + d" if "b" is greater than "c" and "a + 0 + d" otherwise.

9.3 Logical Operators:

The simplest logical operator is '!', the unary "not". Since B uses the integers one and zero to represent "true" and "false" respectively, '!' works in an arithmetic way too. If "a" is non-zero, "!a" is zero; if "a" is zero, "!a" has the value one. Since '!' is a unary operator, it is evaluated before both arithmetic and relational operators.

The operator '&&' is the logical "and" in B. Thus

a && b

has a value of one if both "a" and "b" are non-zero, and a value of zero otherwise.

The logical "or" in B is represented by '||'. Thus

a || b

has a value of zero if both "a" and "b" are zero, and a value of one otherwise.

'&&' and '||' are evaluated after all relational operators have been evaluated.

9.4 Assignments:

We often speak of an "assignment statement", but this can be misleading in B. To be sure, some statements in B consist only of assignments of values to variables. However, assignment is an operation which can be done in any kind of B expression, not just in explicit "assignment statements". For example, we have already pointed out that you can have a statement of the form

if (a = b) ...

without getting errors (although it may not be what you want). This is because assignment is an expression, and all expressions are legitimate in if statements (see Section 10).

The value of an assignment expression is the value being assigned. Thus

a > (b = c)

assigns the value of "c" to "b", and then tests to see if that value is less than "a". Because of this, multiple assignments are valid.

a = b = c ;

is equivalent to

a = (b = c);

The result is that both "a" and "b" are assigned the value of "c".

In addition to the simple assignment operator '=', there are a number of other useful assignment operators that can be used to give variables modified versions of their previous values. These operators are all formed by combining one of the normal binary operators with the '=' character, as in '+='. Typing

x += 3 ;

is equivalent to

x = x + 3 ;

However, the first form is preferable because it is more compact and easier to read. There are also assignment operators of the form '-=', '%=', and so on -- one for each binary arithmetic operation.

Note that assignment operators are the last operators to be evaluated in an expression. Thus

x *= a + b ;

is equivalent to

x = x * (a + b) ;

This is quite a bit different than

x = x * a + b ;

where the multiplication is evaluated before the addition.

10. The If Statement

The simplest form of the if statement is

if (expression) statement ;

where "expression" is any legitimate B expression and "statement" is any B statement. Thus

if (a > b) c = b ;

assigns "c" the value of "b" if "a" is greater than "b".

It is also possible to have a compound statement after the "if", as in

if (a > b) {
    c = a ;
    a = b ;
    b = c ;

(This short routine checks whether "a" is greater than "b", and if not, it exchanges the two values.)

The statement or compound statement in an if statement is executed if and only if the value of <expression> is non-zero. Thus

if (a) printf("%d",a) ;

prints the value of "a" provided it is non-zero. This format is equivalent to

if (a != 0) printf("%d",a);

but is more concise. Most programmers use the first format if "a" is considered to be a "logical" variable and the second format (!= 0) if "a" is numeric. Similarly, saying

if (!a) ...

is equivalent to

if (a == 0) ...

Again, the first form is used if "a" is a logical variable and the second form is used if "a" is numeric.

As a final example, we point out that an if of the form

if (a = b) statement;

has the following effect. The variable "a" is assigned the value of "b"; if "b" is non-zero, "statement" is executed since the value of the if-condition is non-zero; if "b" is zero, "statement" is not executed.

A more advanced form of the if statement is one containing an else clause. The general form of this type of if is

if (<expression>) <statement>; else <statement> ;

This is used in the simple function below, designed to return the maximum of two numbers.

max1(a,b) {
    if (a > b) return(a);
    else return(b);

else clauses may take compound statements in the same way that ifs can. For example,

max2(a,b) {
    if (a > b) {
    else {

is a function that not only returns the maximum of two numbers but prints that maximum as well. The above example also demonstrates how a good brace bracket style serves to illustrate logical groupings and helps avoid the problem of unbalanced braces.

Ifs and elses can be nested in a very straightforward way. This is done in the somewhat clumsy function below which returns the maximum of three integers.

max3(a,b,c) {
    if (a > b) {
        if (a > c) return (a);
        else return (c);
    else {
        if (b > c) return (b);
        else return (c);

Another way that ifs and elses can be nested is shown in the silly function below that checks which one of its four arguments is zero. (This could be used in a program that simulated the old shell game.)

which(a,b,c,d) {
    if (!a) printf("No.1 is zero.*n");
    else if (!b) printf("No.2 is zero.*n");
    else if (!c) printf("No.3 is zero.*n");
    else printf("No.4 is zero.*n");

Note that here we have made use of the fact that "if (!a)" is the same as "if (a = 0)" (since "!a" is non-zero when "a" is zero). The function also assumes that "d" is zero if none of the first three are.

Although constructions like the one above are perfectly correct logically, they can quickly become unreadable if things go on too long. Often it is better to use B's switch statement instead of long chains of ifs and elses. The switchstatement is described in Section 17.

11. Looping

There are several constructions in B that can be used for looping. In this section, we will examine a few of these.

11.1 The While Statement:

The general form of the while statement is

while (expression) statement ;

It is possible to use a compound statement in place of the simple statement above, in which case the ';' is not used.

The while statement tests the given expression to see if it is non-zero. If it is, the statement or compound statement is executed, after which the expression is tested again. The statement or compound statement will be executed repeatedly until the expression is found to be zero. If the expression is zero to begin with, the statement is not executed at all. For example,

while ( ( c = getchar() ) != '*n') putchar(c);

uses GETCHAR to obtain a character from the current input unit and then checks to see if the character is a new-line '*n'. As long as the input character isn't a new-line, the character will be printed with PUTCHAR, and the whole process will begin again. Thus the single statement above reads in an entire line of input and prints it out (minus the new-line character on the end).

Notice that we had to put the "c = getchar()" in parentheses. This is because we wanted to perform the input operation and the assignment before the comparison. If we had omitted the parentheses, the comparison would have been done before the assignment (since assignments are always done last), and therefore "c" would have received the value zero or one depending on whether the input character was equal or not equal to '*n'.

Below we give another example of a program using a while loop. This program reads in positive integers from the terminal and prints out their average. As the integers are typed in, they may be separated by any number of blanks or new-line characters. A zero indicates the end of input.

main() {
    auto c,n,num,sum,done;
    sum = n = 0;
    done = 0;
    c = getchar();
    while (!done) {
        while ( (c == ' ') || (c == '*n') )
            c = getchar();
        num = 0;
        while ( (c != ' ') && (c != '*n') ) {
            num = num * 10 + (c - '0');
            c = getchar();
        if (num == 0) done = 1;
        else {
            n += 1;
            sum += num;
    printf("Average = %d %d/%d*n",sum/n,sum%n,n);

There are several points we would like to make about the above program. The outer "while" loop ("while (!done)") will continue to loop and obtain numbers until one of the numbers obtained turns out to be zero. At this point, the if statement sets "done" to 1, so that the next time the "while" condition is tested "!done" will be zero. Thus the variable "done" is being used as a logical switch.

The second "while" loop flushes through any blanks or new-line characters in the input. When it finds a character which is neither a blank nor a new-line, it assumes that it has found the start of a number. If the character is NOT an ASCII digit, our program will not receive any error diagnostics, but its result will naturally be meaningless.

The third "while" loop collects the number from the input. Collection of the number stops when the loop finds another blank or new-line character. The statement

num = num * 10 + (c - '0');

may seem confusing at first. Consider the case where the number 12 has already been collected in "num" and the ASCII character '3' has just been obtained with GETCHAR. First we multiply by 10 to get 120, and next we would like to add on the three to get 123. However, there's a catch: the ASCII character '3' is not the number three -- in fact, it's octal 063. To make the conversion from character to integer, we subtract an ASCII '0', octal 060. This gives us the integer 3 that we want. In the same way, '0' - '0' is the integer 0, '9' - '0' is the integer 9, and so on. This method of converting characters to integers is a very common trick you should make sure you understand. (A small test of your understanding: will '10' - '0' give you the integer 10?)

Once the number has been collected in "num", the program sees if it is zero. If so, the looping process is finished; if not, the number is added to the accumulated "sum" and the count "n" of integers is incremented by one.

Note that the PRINTF function prints out the integer part of the average followed by a fraction. For example, if you were to use the program above to find the average of the first ten positive integers, the output would have the form

Average = 5 5/10

11.2 Labels and the goto Statement:

Any statement in a B function can have one or more labels. A label is just an ordinary B identifier name followed by a colon ':'. The goto statement can be used to branch to any label in the same function; you cannot use a goto to branch out of a function. Below is a simple function that makes use of the goto statement and labels. The function takes a string as its argument and outputs the characters of the string in reverse order.

reverse(string) {
    auto i;
    i = length(string);
    i -= 1;
    if (i) goto loop;

Note that this example has made use of two library functions that often come in handy when working with strings. LENGTH returns the number of characters in the string, not counting the 000 character that marks the end of the string. CHAR is a function that returns the character which is in the i'th position of "string". Thus the loop above starts its output with the last character in "string" and works backwards to print the string in reverse order.

You should also note the way the above function deals with the positions of characters in "string". The character at the beginning of any string is in position zero, the next character is in position one, and so on. Thus if there are N characters in a string, the last character will be in position N-1. This is the reason that we decremented "i" before we printed out any characters.

Anyone working with B should always think twice before using a goto. Modern programming theory seriously frowns on gotos in any language, since they frequently lead to "spaghetti programming" and unreadable code. Almost always, gotos are unnecessary; the example above could have been handled just as easily using a while statement (we do this in Section 14.1). gotos can also be circumvented using function calls and if-elses with compound statements.

Not only are gotos unnecessary and sloppy, they are often dangerous. For example, it is legal to branch into a compound statement in the middle of a "while" loop, but this can often lead to nasty surprises if you aren't very careful. All things being considered, a program without gotos is apt to be a more clearly thought-out program than one with gotos branching all over the place.

11.3 Other Looping Statements:

In this section, we will very briefly touch on the other iterative statements available in B.

The repeat statement will repeat a statement or compound statement forever. There are several ways out of this (e.g. return statements) but the repeat itself will never stop repeating. Below is a very simple function using the repeat statement. When called, the function will wait until the time of day is past noon, and then it will return to its caller.

waitnoon() {
    extrn noonpulse;
    repeat {
        sleep(5 * 60);
        if ( time() > noonpulse) return;
noonpulse {64 * 1000 * 3600 * 12};

The SLEEP library function used above causes the program to go to sleep for a specified number of seconds; in this case, it is five minutes. The TIME library function returns the time of day in "pulses", where one pulse is 1/64th of a millisecond. Thus the external variable "noonpulse" is initialized to the value of "noon" in pulses. "waitnoon" will repeatedly go to sleep for five minutes, wake up to check the time, and go back to sleep again if it isn't yet noon.

The do-while statement is very similar to the while statement except that the conditional test is done at the bottom of the loop instead of the top. Thus

do c = putchar(getchar()); while c != '*n';

will repeatedly read and print characters until it finds a new-line. The new-line itself will be printed out because the condition is not tested until after the statement is executed.

The for statement is a more complicated iterative statement that serves a similar purpose to FOR in Algol. For further details, see the B Reference Manual.


Comments can be placed anywhere in a B program, inside or outside of function bodies. A comment can even be placed in the middle of statements. In fact, comments can be placed anywhere you can put a space character.

The beginning of a comment is marked by a "/*" and the end is marked by "*/". The compiler will ignore everything in between. For example,

fact(n) {       /* This function calculates n! */
    auto i,     /* will go from 1 to n */
         j;     /* will accumulate n!  */
    i = j = 1;
    while (i < n) {
                /* loop does actual computation */
        i += 1;
        j *= i;
}               /* end of function */

The above example shows several popular ways of using comments to explain what the statements of a function are doing.

Comments may NOT be nested. For example,

/* This comment /* has a comment inside. */ */

will result in a syntax error.

13. Vectors

People who have worked with other programming languages will be familiar with the idea of a vector. However, the actual implementation of vectors in B is quite different from the implementation in most other languages.

At its simplest, a vector is just a collection of consecutive words in memory along with an associated name. The name of the vector does not refer directly to the vector itself; instead, the vector name refers to a single word of memory which in turn contains the address of the first word of the vector. The word that holds the address of the vector is known as the "vector pointer".

It is very important to keep this distinction clear in your mind. If, for example, "vec" is the name of a vector, the statement

vec = 5;

will not set any of the words in the vector to the value 5. Instead, the statement puts the integer 5 into the single word of storage which previously held the address of the vector. Consequently, unless you've saved the vector's address some place else, the statement above has just written over the only way you had of finding your vector.

Simple variables in B are typeless, and so are vectors. The first word of a vector can contain a decimal integer, the next can hold the address of a string, and the next can be a floating point number. Again, it is up to the programmer to keep track of the kinds of objects that are stored in the words of a vector. Another thing that you should bear in mind is that a vector is always viewed as a sequence of single words -- there is no direct way to allocate a vector of multi-word records, though of course you can simulate this situation with a little work.

In the subsections to come, we will discuss various aspects of the use of vectors in B.

13.1 Subscripting:

Subscripts for vectors are specified in square brackets immediately following the name of the vector pointer as in


The above construction refers to the 10th word of the vector "vec". The address of this word is obtained simply by adding the number 10 to the vector pointer "vec". In the same way, "vec[0]" refers to the start of the vector, since B simply adds zero to the word that points to the start of the vector. Thus all vectors in B have a base index of zero (not one, as in Fortran for example).

B performs the subscripting operation quite mindlessly, simply adding the subscript to the value of the vector pointer. Thus


refers to the memory address obtained by adding the contents of "i" to the contents of "a". Surprising as it may seem to people who are used to other languages, you get exactly the same result by typing


Since B is typeless, the compiler makes no effort to check whether "a" or "i" is actually the name of a vector; as far as the compiler is concerned, both are simply the names of single words in memory.

B does NOT check whether an index has gone past the end of a vector. For example, if you have only declared "vec" as 20 words long, you can still talk about "vec[21]", "vec[30]" or even "vec[100]" without getting an error message. Of course this is a very dangerous practice, since it's hard to tell what is in memory beyond the end of your vector. You may be looking at garbage, you may be looking at storage for another variable or vector, or you may be looking at the executable code of your program. If you go too far past the end of your vector, you may even try to look at memory that hasn't been allocated to your program and so end up with a memory fault error from the system. In B, it is the programmer's responsibility to make sure that indices don't run off the end of a vector.

Subscripts can be expressions as well as simple variables or constants. Thus the following are all valid examples of subscripts:

d[x - '0']
test[a > b]
vector[i += 2]

As a last example, suppose you had declared a vector "hello" with maximum index of 30 and then assigned a value of "hello+10" to the variable "goodbye". "goodbye[0]" would thus be "hello[10]", since the vector pointer in "goodbye" is ten more than the vector pointer in "hello". "goodbye[-1]" would be a perfectly good construct in this case, referring to "hello[9]". Using negative indices might be dangerous when you don't know what lies in front of a vector in memory, but in a case like this there is absolutely nothing wrong with negative indices.

13.2 External Vectors:

Vectors may be either external or auto, just like simple variables. When an external vector is going to be used in a function, it must be declared in an extrn statement as in

extrn vec;

Notice that you do NOT specify the size of the vector in the extrn statement.

The size of an external vector is given when you specify the vector outside all the functions of your B program. The simplest way to specify a vector is with a statement like


outside all the function bodies of your program (generally at the end of the program along with all the other external variable specifications). The above statement allocates a vector named "vec" with a maximum index of 30. Since the first word of the vector has an index of 0, the specification above actually allocates 31 words of memory for "vec".

Vectors can be initialized in the same way as simple variables, by specifying the initialization values in brace brackets after the vector name. Below we give some typical external vector initializations.

myvec[1] {0,1,2};
yourvec[] {"This is a string", 3, 'a*n', -1};
hisvec[14] {'x','y','z'};
hervec[1] {07, 077, 0777, 07777, 077777, 0777777};

The declaration of "myvec" is quite simple. A maximum index of 2 means three machine words, and these words are initialized to 0, 1, and 2 respectively.

The declaration of "yourvec" demonstrates that you don't have to specify a maximum index if you give some initialization values -- B will allocate as many words of memory as you have initialization values. Thus "yourvec" will have four words of memory, or a maximum index of 3. "yourvec[0]" is a word which contains a pointer to a string constant. "yourvec[2]" contains a two-character ASCII constant, the letter 'a' followed by a new-line character. "yourvec[1]" and "yourvec[3]" contain normal integers.

The declaration of "hisvec" shows that you don't have to initialize all the words in a vector. Fifteen words are allocated for "hisvec" because of the maximum index specification of 14. "hisvec[0]" contains the ASCII character 'x', "hisvec[1]" contains 'y', and "hisvec[2]" contains 'z'. The other words of "hisvec" are not initialized and hence will contain garbage until they are assigned values during program execution.

The declaration of "hervec" shows that you can specify more initialization values than the maximum index indicates. In this case, the maximum index is ignored and the vector is created large enough to hold all the initialization values; thus "hervec" would actually have a maximum index of 5.

13.3 Auto Vectors:

Auto vectors are declared in an auto statement in the function that intends to use them. For example,

auto x[10],y[20];

creates a vector "x" with a maximum index of 10 and a vector "y" with a maximum index of 20. You can also declare auto vectors with their maximum index specified in a constant expression as in

auto alphs['z' - 'a' + 1];

Unlike external vectors, auto vectors cannot be initialized when they are created; the contents of an auto vector must be set explicitly using assignment operations.

13.4 Multi-Subscripting:

Unlike some other languages, B does not explicitly support arrays of two or more dimensions. However, you can use as many subscripts with a vector name as you want, since subscripting is really just a simple operation that obtains the address of a word in memory by adding the contents of two other words. Consider, for example, a construction like


The value of "i" is added to the vector pointer "x" to get the address of a word in memory. Next, the contents of that word are added to the contents of "j" to get a second address. "x[i][j]" refers to the contents of that second address. Thus if "x" was an external vector initialized as

x[] {"hello","abcdefghi"};

ask yourself what the value of "x[1][1]" would be. "x[1]" contains a pointer to the string "abcdefghi". Adding a 1 to that address gives the word that contains the four ASCII characters 'efgh'. Thus the contents of "x[1][1]" would be 'efgh'. This example serves to point out the similarity between vectors and strings in B, since the contents of both are accessed indirectly through a pointer word.


At compile time, you often don't know how big a vector you'll need for a certain purpose; for example, you may be reading in a variable number of integers, with the first integer telling how many entries will follow. One rather mindless solution to this problem is to allocate a gigantic vector which you hope is big enough to hold the maximum number of elements you might ever have. Obviously though, this can waste a lot of space in memory, especially if most of the jobs you run are small in comparison to the maximum size.

In cases like these, it is best to allocate your vectors dynamically. Thus you do not declare your vectors with a specific size in your source code, but instead wait until execution time to decide how much memory your vectors will need.

In B, vectors can be allocated using the GETVEC library function. This is called with a statement of the form

ptr = getvec(size);

where "size" is the maximum index you want the vector to have. The value which GETVEC returns (to "ptr") is a standard vector pointer to the memory space which GETVEC has obtained. From this point on, you can refer to the elements of this vector as "ptr[0]" up to "ptr[size]".

When you are finished with a dynamically allocated vector, it is usually a good idea to release the memory which the vector occupies. This does not actually make your program smaller, since the released memory is not returned to the system; however, it does free up the memory and make it available for allocating new vectors. Thus if you release your vectors when you are through with them, your program may not need to grow to get space for more vectors. In this way, you help to keep the size of your program as small as possible.

To release vector storage, you can use the library function RLSEVEC. This is invoked with a call of the form

rlsevec( ptr, size );

where "ptr" points to the beginning of the vector's storage and "size" is the maximum index of the vector.

14. More Operators

In this section, we will discuss a number of operators which are sometimes not found in other programming languages.

14.1 Increments and Decrements:

One of the most common operations in computer programming is incrementing or decrementing a variable by one. Because these operations are so common, B has four special unary operators that provide a very neat way to increment or decrement variables.

The "++" operator increments a variable by one. If the operator is placed before the variable as in "++i", the variable is incremented before it is used; if the operator is placed after the variable, the variable is incremented after it is used. For example, the two statements below zero the contents of a vector with maximum index 10.

i = 0 ;
while (i <= 10) vec[i++] = 0;

The while loop begins with "i" at zero. Immediately after "i" is used as zero in "vec[0]", "i" is incremented by one. Thus the next time through the loop, "i" has the value one. Once this value is used, "i" is incremented once more, and so on. We could also have said

i = -1;
while (i < 10) vec[++i] = 0;

This time "i" starts out at -1 and is incremented to zero before it is used as a subscript. Note that writing the process in this way has necessitated changing the "<=" to a "<" in the while condition.

The "--" operator decrements a variable by one. If the operator is placed before the variable, the variable is decremented before it is used; if the operator is placed after the variable, the variable is decremented after it is used. As an example, we will use this operator to rewrite the "reverse" function given in Section 11.2.

reverse(string) {
    auto i;
    i = length(string);
    while (i--) putchar(char(string,i));

The while loop checks the value of "i" and decrements it immediately afterwards. The decremented value of "i" is the one that is used in the call to CHAR. The loop will end when the while condition finds that "i" is zero. Note that this will be after the zero'th character has been obtained from "string" by CHAR. Note also that "i" will have the value -1 at the end of the loop: the loop stops when "i" is found to be zero, but immediately afterwards "i" is once more decremented by one.

You can greatly simplify your programs by using "++" and "--" for incrementing and decrementing by one. They are generally cleaner and more readable than any of the alternative ways of coding the same process. As an exercise, you might go back over the program examples we've presented up to this point and see where we could have used "++" and "--" in place of less elegant substitutes.

As a final note, you should be aware that these operations can appear inside any B expression, or they can stand alone as a single B statement. Thus the else statement of the program in Section 11.1 could have read

else {
    ++n ;
    sum += num;

14.2 Bitwise Operators:

For many applications, it is useful to be able to work with the individual bits of a machine word. In this section and the next, we will be looking at operators which manipulate bits.

The simplest bitwise operator is "~". This takes the one's complement of a machine word, i.e. it turns all the one bits in the word into zeroes, and it turns all the zero bits into ones. Thus "~0" is the octal number 0777777777777.

"&" and "|" indicate the bitwise AND and OR operations respectively.

x & y

yields a machine word which contains a one bit in every position where both "x" and "y" have one bits, and which contains a zero bit in every other position. Thus "001 & 011" results in 001.

x | y

yields a machine word which contains a one bit in every position where either "x" or "y" or both have one bits, and which contains a zero bit in every other position. Thus "001 | 010" results in 011.

It is very easy to confuse the bitwise operators "&" and "|" with the logical operators "&&" and "||". This can lead to many unfortunate errors. For example, consider the two expressions

010 & 001
010 && 001

The first does a bitwise AND operation. Because the two operands do not have any one bits in coinciding positions, the result is zero. The second expression does a logical AND operation. Because both operands are "true" (non-zero) the logical expression is also "true", and the result is one! To avoid problems like this, you should always double-check your ANDs and ORs to make sure you have typed in the operator you really wanted.

Some common examples of bitwise operations are shown below.

y &= 0777;
a = (b & 0777777777000) | (c & 0777);

The first statement above clears all but the bottom nine bits of the variable "y". If "y" had contained 'abcd', the result would be just 'd'. Note that this uses an assignment shorthand like the ones in Section 9.4. The second statement above creates a word consisting of the three upper bytes of "b" and the lowest byte of "c". Thus if "b" contained 'abcd' and "c" contained 'efgh', "a" would be assigned 'abch'. The second statement could also have been written

a = (b & ~0777) | (c & 0777);

This alternate format is preferable to the previous form because it is less cluttered and doesn't force you to "count 7's".

In addition to the operators described above, there is also an exclusive OR operator "^". This is described in the B Reference Manual.

14.3 Bit-shifting Operators:

The operators "<<" and ">>" shift the bits of a machine word left and right respectively. The bits of the word which are vacated by the shift are filled with zeroes. The bits which are shifted out of the word vanish forever; they are NOT shifted into adjacent words of memory. For example,

y = x >> 9;

assigns "y" the value of "x" shifted nine bits to the right. If "x" contains 'abcd', "y" will receive 'abc'. The ASCII 'd' occupying the bottom nine bits of "x" is shifted completely out of the word.

The routine below uses "<<" to pack four input characters into a single variable "h".

i = h = 0;
while ( ++i <= 4) h = (h << 9) | getchar() ;

The second statement shifts "h" left to make room for the new character and then ORs in the new character into the zeroes in the bottom byte of "h".

With these shift operators, we are now in a position to write a program that duplicates what the library function CHAR does (see Section 11.2).

char(s,n) { /* s points to a string;
             * CHAR returns nth character of s
    auto word,shift,pos;
    word = s[n/4];  /* get word containing char */
    pos = n%4;          /* get position in word */
    shift = 27-9*pos;   /* no. of bits to shift */
    return( (word>>shift) & 0777 );
           /* shift word and clear excess chars */

There are several points to make about the above function. For example, note how it accesses the correct word in the string "s" with subscripts. There are four characters per word in the string "s"; thus the n'th character is in word "n/4". To get this word, we add the subscript "n/4" to "s" since "s" contains a pointer to the string, not the string itself. Thus "s[n/4]" is the machine word which contains the character we want.

Now we want this character to be in the lowermost byte of the word that CHAR returns. To do this, we will shift "word" the correct number of bits and use "&" to get rid of the characters we don't want. If the character we want is in the top byte of "word", we must shift 27 bits; if the desired character is in the second byte from the top, we must shift 18 bits, and so on. You should go through the program above and make sure that you understand how it accomplishes its objectives.

As a final note, we will point out that the whole CHAR function could have been written in the one (unreadable) line given below.

char(s,n) {

For normal applications however, it is best to avoid code like this that is compact but undecipherable.

14.4 The Query Expression:

The query expression is a rather unusual one. It is a conditional expression, since the value of the expression depends on a standard B condition. The query expression has the form

condition ? expr1 : expr2

If the condition is true, the expression has the value of expr1; if the condition is false, the expression has the value of expr2. For example, consider

x = (a > b) ? a : b ;

If "a" is greater than "b", "x" is assigned the value of "a"; otherwise, it is assigned the value of "b". Thus "x" is assigned the maximum of "a" and "b". With the query expression, we can rewrite the function "max1" given in Section 10:

max1(a,b) {
    return( (a > b) ? a : b );

This is obviously a much more elegant way of writing "max1" than doing the same thing with if statements.

14.5 Addressing and Indirection:

B provides two unary operators which can be used for addressing and indirection purposes.

The unary "&" is the address operator. For example, the value of "&x" is the memory address of the variable "x". Like all addresses in B, this address is a maximum of 18 bits long and will be located in the bottom half of the word.

The unary "*" is the indirection operator. It indicates that its argument should be taken as an actual address. For example, "*0100" refers to the contents of machine address 0100. If the variable "x" contains a memory address, "*x" can be used to the contents of that address. Thus "*x" is the same thing as "x[0]" since both use "x" as the address of another location in memory. In the same way, "*(x+1)" is the same as "x[1]", and so on.

Since "&x" is the address of "x" and since "*&x" refers to the contents of the machine word whose address is "&x", "*&x" is the same as "x" itself.

In Section 19.1, we will give some examples of how the addressing and indirection operators can be used.

15. Escape Sequences

Escape sequences are used to represent certain characters that are difficult or impossible to enter in a particular context. The '*n' construct for new-line is an escape sequence with which you are already familiar. Another commonly used escape sequence is '*e' representing the ASCII null character 000.

Both '*n' and '*e' are used to represent special characters in ASCII character constants and strings. Other common escape sequences for character constants and strings are

*'      ' (single quote)
*"      " (double quote)
*t      tab
*r      carriage return (no line feed)
**      * (asterisk)

Escape sequences are also used to enable a user to input characters that might not appear on his or her terminal keyboard. For example, "$(" and "$)" can be used to denote the brace brackets "{" and "}" respectively.

A complete list of escape sequences accepted by B is given in Appendix C.

16. Manifest Constants

A manifest constant (or more briefly, a manifest) is just a symbol which can be used during the compilation process to stand for a string of characters. Manifests are defined in statements of the form

name = text;

where "name" is any valid B identifier name, and "text" is a collection of any characters except ';'. For example,

A = 10;
B = A + A;
C = B * B;

are all valid manifest definitions. (Note that we have put the name of our manifests in upper case. This is a convention which many programmers use to distinguish their manifest constants from ordinary variables.)

Whenever a manifest constant is encountered during the compilation process, the compiler replaces the name of the constant with the text that was associated with the constant in the manifest definition. Thus the statement

auto vec[A];

would be changed to

auto vec[10];

during compilation if "A" had been defined as "10". This is one of the most common uses of manifest constants: if you always use a manifest to represent the size of a vector rather than specifying the size explicitly, changing the size of the vector is quite simple. Rather than changing every place where the vector size is referenced, you need only change a single manifest definition.

Note that manifest constants are purely textual abbreviations that are used in the compilation process. For one thing, manifests do not refer to locations in memory. For another, the manifests are expanded in a very mechanical way. Looking at the manifest definitions above, you might say to yourself that A was 10, B was 20, and C was 20*20 = 400; however, this is NOT true. Below we expand C in several steps.

C = B * B;
C = A + A * A + A;
C = 10 + 10 * 10 + 10;

Consequently, C will actually be evaluated as 10+100+10 or 120!

Manifests are very convenient to use in B programs. As you have seen however, you must be somewhat careful whenever you use them. Many people always put their manifest constants in parentheses as in

B = (A + A);
C = (B * B);

This avoids the "order of evaluation" problem which occurred in the previous definition of C.

Manifest constant definitions are placed outside the bodies of the functions of a B program, generally at the very top of the source code. Once a manifest has been defined, it can be used by any subsequent statement in the program, whether the statement is an executable statement, an external variable specification, or another manifest constant definition. A manifest constant cannot be used in statements that come before the manifest definition (which is one reason why most programmers define their manifests at the TOP of the source code).

Note that manifest constants may NOT be redefined. If you were to say

A = 20;

after A has been defined as 10, the compiler would replace the A with the text "10" to get

10 = 20;

Obviously, this will result in an error.

17. The Switch Statement

The switch statement is used in situations when a number of different actions may be performed, depending on the value of a certain key. This can best be illustrated by giving an actual example.

The function below decides if a character is valid in a B identifier name. If the character is valid, it is returned as the value of the function. If the character is invalid, a message is issued and the program is terminated. Dots are accepted, but a warning message is issued. The program also converts upper case letters to lower case.

check(c) {
    switch(c) {
      case 'A' :: 'Z' :
        c += 'a' - 'A';
      case 'a' :: 'z' :
      case '0' :: '9' :
      case '_' :
      case '.' :
        printf("Warning: avoid using dot.*n");
        printf("Invalid character:'%c'.*n",c);

The switch statement tests the key "c" to see which one of the following cases is pertinent. A case heading can specify a single case, as in

case '.':

or it can specify a range of cases separated by "::" as in

case 'A' :: 'Z' :

When the switch has decided which case is applicable, execution will jump down to the statement after the proper case heading. Thus if "c" has the value 'X', the switch will decide that the key is in the range 'A' :: 'Z' and begin executing the statement

c += 'a' - 'A';

(This conversion process is similar to the conversion of ASCII digits to integers shown in Section 11.1.) Execution continues until it reaches the end of the switch or a break statement. The break statement tells the program to "break out" of the switch statement and continue with the statement after the switch (in this case, the return statement).

If none of the specified cases are applicable, execution will jump from the start of the switch to the heading named default. If there is no default specified, execution would begin with the statement immediately following the switch. The default section of our function "check" contains a call to the library function EXIT. The EXIT function terminates a B program immediately. Thus if a function in the B program called "check" with an invalid character, the caller would never regain control; EXIT brings the program to an abrupt stop.

Under normal conditions, execution would continue from the end of the default section to the statement after the switch. However, because of the EXIT at the end of the default section, execution will never reach the return statement if there is an invalid character. Thus "check" only returns valid characters.

Note that we could replace the break statements in the above function with return statements. If we did so, "check" would return to its caller from the middle of the switch statement, and we would not have been forced to include the return statement after the switch. We chose to use the above method to demonstrate the use of the break statement.

Another type of heading that didn't appear in the first example is a relational heading. This uses one of the comparison operators: ">", "<", ">=", "<=". For example,

switch(i) {
    case <0 :
        printf("Number is negative");
    case 0 :
        printf("Number is zero");
    case >0 :
        printf("Number is positive");

uses the switch statement to determine the sign of an integer.

Note that the key of the switch statement need not be a simple variable like "c" and "i" in our examples; any valid expression will do. Similarly, the case headings will accept constant expressions in place of the constants, as in

case ('a' << 9) :: (~0777) :

Note also that a single statement may have several case headings preceding it. This was the case above where the one break statement was used whether "c" was a lower case letter, a digit, or an underscore.

17.1 Break and Next:

We have already shown one use of the break statement in the previous section. Breaks can also be used in while statements, for statements, repeat statements, and do-while statements. In all these cases, break causes the program to break out of the looping statement and go on to the statement following the loop.

The next statement serves a similar purpose to break inside while statements, for statements, and do-while statements. next causes the program to break out of the current loop and to go to the test which determines whether or not to loop again. In other words, it tells the program to start the next iteration of the current loop. In a repeat loop, next causes the program to go back to the start of the loop and continue repeating. next cannot be used in switch statements.

next is not used all that often, but you can get into trouble if you forget that it exists. Unsuspecting programmers frequently try to use "next" as a variable name and end up with a massive collection of error messages. Remember that "next" is a keyword and therefore can't be used for anything else other than next statements.

18. More on I/O

In this section, we talk about redirection of I/O and the handling of I/O units.

18.1 Redirection of I/O:

In Section 8, we mentioned that the current output unit and the current input unit are both set to the terminal by default. The easiest way to change this is by redirecting these I/O units in the command line that executes your program.

Normally, you would execute your B program with a command of the form


where "filename" is the name of the file that contains your compiled program. If you add a construction of the form "<filename" to this command line, the current input unit will be set up to point to "filename" at the beginning of program execution. For example,

go:myprog  <fbaggins/infile

will look for a compiled program in "myprog" and will begin execution of that program with the current input unit set to a file called "fbaggins/infile".

Similarly, a construction of the form ">filename" on the command line sets the current output unit to "filename" at the beginning of program execution. Thus

go:prog2  >sgamgee/outfile

will begin execution of "prog2" with the current output unit set to the file "sgamgee/outfile". The characters '<' and '>' were chosen to indicate redirection of I/O because their appearance suggests data coming from a file and going into a file respectively.

Note that this redirection of I/O happens at execution time, not during compilation. The program need never know whether it is writing to a terminal or a disk file in many cases. (This of course depends on the kind of data that is being read or written; you would not want streams of unprintable characters to be sent to a terminal, though you could write such things to a disk file. Despite such possibilities, this approach to redirecting I/O can still be regarded as comparatively transparent to the B program itself.)

18.2 Multiple Units:

Many programs will function quite happily with only one input unit and one output unit throughout the course of execution. In cases like this, the ability to redirect I/O on the command line is all that is necessary. However, for more complex programs it is frequently necessary to have more than just two units.

Before you can use an I/O unit, the unit must be opened. The current input and output units are opened for you by the routines that set things up before your program begins executing. However, all other I/O units must be opened explicitly with a call to the library function OPEN. This call has the form

unit = open(filename,mode);

where "unit" is an ordinary B variable, "filename" points to a string containing the name of the file you want to open, and "mode" is a string which contains some information about the way you intend to use the file. For example,

inunit = open("gollum/ringfile","r");

opens a file called "gollum/ringfile". The "r" mode above indicates that the file is to be opened for reading. Below we list the modes which are most commonly used when opening a file. For a complete list of open modes, see "expl b lib open".

Open the file for reading.
Open the file for writing.
Open the file for appending. This means that you want to write on the file, but you want the data to be put on the end of the file rather than writing over anything that might be there already. If there is nothing in the file, "a" mode is the same as "w".
Do not let this call to OPEN change the current I/O units.

The "u" option needs a little more explanation. When a file is opened for reading, OPEN normally makes that file the new current input unit. Similarly, when a file is opened for writing or appending, OPEN normally makes that file the new current output unit. If you want to open a file but do not want your current I/O units to change, you should use the "u" option in the call to OPEN, as in

outunit = open("smaug/outfire","wu");

You might be wondering why we have shown our calls to open in the form

variable = open(...);

What value does the variable receive from the OPEN function? The answer is that OPEN returns an integer which is called the unit number. This unit number can then be used in I/O calls to specify what input or output unit should be used. For example,

printf(outunit,"This goes to smaug/outfire.");

will print the given string on the I/O unit with the number contained in the variable "outunit". This will be the file "smaug/outfire" if the unit was opened with the above call to OPEN. Similarly,

c = getc(inunit);

will get a character from the input unit "inunit" that was opened for the file "gollum/ringfile". (The function GETC is the same as GETCHAR, except that GETCHAR will not take a unit number as an argument and GETC will. GETCHAR always reads from the current input unit.)

You might be a little confused about the call to PRINTF above. In earlier chapters we have been specifying the format string first, but now we have slid in the unit before the format string. The fact is that the unit is an optional argument in calls to PRINTF. If the unit number is there, PRINTF will send its output to that unit (provided of course that the unit is open for output). If the unit number is not there, PRINTF will send its output to the current output unit.

Next you might ask how PRINTF knows whether its first argument is a format string or a unit number. The answer is that unit numbers are always integers between -5 and 19 (inclusive). Thus if the first argument is a number in this range, PRINTF assumes it's a unit number. This is a fairly safe assumption, since you should not have strings starting in such low memory addresses. If you do have such low-address strings, your program is very, very sick.

Below we give a very simple program for copying one file into another.

FILEIN = "nail/file";
FILEOUT = "circular/file" ;
main() {
    auto inu,outu;
    inu = open(FILEIN,"r");
    outu = open(FILEOUT,"w");
    while ( putchar(getchar()) != '*0') ;

Note that we have used manifest constants to hold the names of our files. This is a very common practice. To change the file names, we need only change the manifests where they are defined at the start of the program; we do not have to change the program itself. Note also that we did not need to specify unit numbers for the calls to PUTCHAR and GETCHAR -- the act of opening the two files set the current input and output units to the files we wanted. Since both PUTCHAR and GETCHAR use the current I/O units, we didn't have to give the unit number explicitly.

Lastly, note that there doesn't appear to be a statement following the "while" condition. There is a statement there, but it is a special statement known as the null statement. The null statement is a very simple one:


It doesn't do anything, but it is handy for just this sort of situation: "while" loops in which all the work is done inside the condition expression. During the condition testing, the while loop will repeatedly read in a character and copy it out until it comes upon an ASCII null 000. This is the character that GETCHAR returns to indicate end-of-file.

The CLOSE function used in the example above will be discussed in the next section.

18.3 Closing Files:

When you are finished with an open file, it is a good idea to close it with the library function CLOSE. This is not completely mandatory, since there are wrap-up routines which will close any files that are still open when your program finishes execution. However, the information that a program needs to maintain open files takes up about 500 machine words for every file, so you can save a lot of memory space if you close your files when you're finished with them.

The usual format of a call to CLOSE was shown in the previous section:


where "unitnum" is the number of the unit you wish to close.

18.4 .READ and .WRITE:

We have said that a call to OPEN changes one or both of the current I/O units. To change one of the current I/O units in the middle of your program without opening a file, you can use the library functions .READ and .WRITE. For example,

oldunit = .read(newunit);

changes the current input unit to unit number "newunit". .READ returns the number of the old current input unit to the variable "oldunit". A similar call to .WRITE will change the current output unit and return the number of the previous current output unit.

The program below is a very simple routine for writing messages to the terminal in the middle of program execution.

errmsg(n) {
    extrn msglist,errunit;
    auto savunit;
    savunit = .write(errunit);
errunit {-4};
msglist[] {"Error 0", "Error 1", "Error 2", ...};

A function wishing to issue an error message would simply issue a call of the form


where "n" is the number of the message the function wishes to issue. "errmsg" makes a special "error-unit" the current output unit, since "errmsg" has no idea what unit is current when it is called. The number of the "error-unit" is stored in the external variable "errunit". Using the argument "n" as an index into a list of error messages, "errmsg" prints the diagnostic and then restores whatever output unit was current at the time "errmsg" was called.

In this case, we have chosen the "error-unit" to be unit -4. The reason for this is given in the next chapter.

Note that we could have avoided using the .WRITE function above by specifying a unit number with the PRINTF as in


We chose the given form simply to demonstrate how to use .WRITE.

18.5 Special Units:

There are several I/O units which have specialized functions. In this section, we will briefly describe these functions.

Unit zero is known as the standard input unit. This unit is opened during the set-up operations for program execution -- it does not have to be opened explicitly by the program itself. This unit is the current input unit at the beginning of execution. If the redirection form "<file" is used on the command line which invokes the B program, unit zero is associated with "file"; otherwise, unit zero is associated with the terminal.

Unit one is known as the standard output unit. It is opened during program set-up and is the current output unit at the beginning of program execution. If the redirection form ">file" is used on the command line which invokes the B program, unit one is associated with "file"; otherwise, unit one is associated with the terminal.

There are five units with negative unit numbers. These are all opened during program set-up and they cannot be changed by the program. Units -2 and -3 are used for console output and input when B is used in batch. Units -4 and -5 are always associated with the terminal in TSS. Thus if your program wishes to issue an error message that absolutely must go to the terminal, it can use unit -4 as the output unit. Most messages should be issued through unit 1, since this is the standard output unit. The user can use command line redirection to send such messages to a file for later examination. However, if the message is so important that you want it to go to the terminal regardless of where other output is being redirected, unit -4 is available. In the same way, unit -5 always gets its input from the terminal, regardless of command line redirection of input. If your program issues an urgent request for input through unit -4, you will probably use unit -5 to read that input.

Unit -1 serves a rather different purpose in TSS. All output to unit -1 behaves as if it were typed at system level. For example,

printf(-1,"release the/dogs*n");

would use the system "release" command to release the specified file.

Below we give an example of an extremely simple shell program that makes use of these special units. The program's only feature is that it prompts for commands with a user-chosen phrase.

main() {
    auto command[64];
    repeat {
        printf(-4,"*nYes, master?  ");

This shell will prompt for a command with "Yes, master?". It obtains the command using the GETLINE library function which simply reads in a line of input and stores it in the space allocated for the vector "command". The final PRINTF essentially issues that command to the system. The same effect would be obtained if we used the SYSTEM library function -- see "expl b lib system".

18.6 File Access Conventions:

B follows the standard TSS file access conventions. These are sketched briefly below.

If the name of a file being opened contains slash characters or dollar signs, or if the name is more than nine characters long, the file is assumed to be permanent. If the name is not of this sort, the file accessor will first search your AFT for a file of the given name. If the file is not found in the AFT, the accessor searches for a file of that name in the current catalog.

If the search for the file fails, there are two possibilities. If the file was being opened for input, OPEN will issue a diagnostic message and abort your program. If the file was being opened for output, OPEN will attempt to create the file. If the nature of the file name indicates that the file is permanent, OPEN tries to create a permanent file; otherwise, OPEN will try to create a temporary file.

If a file was already in the AFT when accessed, it will not be removed when closed. Temporary files created during the course of program execution are not removed from the AFT when they are closed either. If a permanent file was not in the AFT before it was accessed, it will be removed when it is closed.

18.7 String I/O:

In B it is possible to open a string for I/O in the same way that you would open a normal I/O file or device. This is done by including an 's' in the mode string that is passed to OPEN. For example,

strunit = open(str,"ws");

opens the string "str" for writing. Thus


would put "hello" into the string "str". Remember that opening a string for I/O makes the string a current I/O unit unless the "u" mode is specified in the call to OPEN.

When the CLOSE function is used to close a "string unit", it appends a '*e' to the string in order to mark "end-of-string".

19. More about Functions

This section deals with some of the niceties of calling functions.

19.1 Arguments:

Whenever a function is called in a B program, the arguments are passed to the function by value. This means that a function can change the value of any of its arguments without affecting anything in the calling program. Try it yourself with a simple program like

main() {
    auto i;
    i = 3;
func(val_of_i) {
    val_of_i = 4;

The PRINTF will print the integer 3 since that is the value "i" has in "main"; "i" in "main" is not affected by the change of "val_of_i" in "func". Thus a function has no direct way of changing one of its arguments and returning that changed argument to the caller.

However, there is an indirect way of doing this. If the caller passes the address of a variable instead of the contents of a variable, a function can use the variable's address to get at the contents. Thus if we really wanted "func" to change the value of "i" in "main", we could change the above program to

main() {
    auto i;
    i = 3;
func(ptr_to_i) {
    *ptr_to_i = 4;

Now the PRINTF will print the value 4. "func" receives the address of "i" and uses the indirection operator "*" (see Section 14.5) to go through the variable's address to change the variable's contents.

Note that the situation is different when the caller passes a vector or a string as an argument. To pass a vector as an argument, you pass the vector pointer. Thus the called function always gets the address of the vector and so can get at the contents of the vector. The same thing holds for strings.

19.2 Recursion:

In B, a function may call itself recursively, either directly or by calling a second function which eventually calls the first. The auto variables of multiple copies of the same function are totally separate. In other words, the various copies of the same function should be thought of as completely different entities which just happen to execute the same progression of statements.

Below we present a simple recursive function that reverses the order of the elements in a vector.

revvec(vec,n) {
    auto t;
    t = vec[0];
    vec[0] = vec[n];
    vec[n] = t;
    if (n > 2) revvec(vec+1,n-2);

"revvec" reverses the elements of "vec" where "vec" has a maximum index of "n". It does this by switching the first and last elements and then calling itself to reverse the vector which begins at "vec[1]" and ends at "vec[n-1]". With each recursive step, two of the vector elements are switched and the number of words still to be switched is reduced by two. (Naturally this process could also be implemented without using recursion, but it wouldn't be as much fun.)

19.3 Limitations on Function Calls:

Every time one function calls another function there is a certain amount of overhead involved. It takes time to set things up for a new function; more importantly, it takes space to keep track of which function called which, where each function should return when it's finished, and so on. In addition to the space needed to keep this information, each time a function is called, space must be allocated for the function's auto variables.

The space for auto variables and for function invocation information is obtained from a region in memory called the stack. The stack has ample room to handle a reasonable number of function calls, but it is finite. If, for example, you tried to use the "revvec" function of Section 19.2 to reverse a vector that is 1000 words long, you would have to make 500 recursive function calls. Such a gigantic number of function calls would overflow your stack space and cause the contents of the stack to spill all over the rest of your program; naturally the result would be disaster.

Most programs really don't need to worry about this happening. When a function finishes execution, it is cleared off the stack so that the same space can be used for other functions. Thus the number of functions your program calls is unimportant as long as the depth of function calls is not too great. If your functions are likely to nest themselves to great depth, you can decrease the probability of stack overflow by decreasing the number of auto variables in each function. Remember, auto variables are stored on the stack; external variables on the other hand are stored elsewhere and thus are not subject to the same size restrictions. Large auto vectors eat up a lot of stack space; if you want a local vector of more than, say, 64 words, you would be better to obtain storage using GETVEC rather than allocating the storage in an auto statement. GETVEC does not steal storage from stack space, but obtains chunks of memory from other sources. (Of course, if you use GETVEC to obtain storage for an auto vector, you should remember to RLSEVEC that vector before leaving the current function.)

For further information about the stack, see the B Reference Manual.

20. Program Command Lines

Sooner or later, most programmers will want to write a program that will accept arguments typed in on the command line. In this section, we will talk about the various ways of accessing such arguments.

20.1 Arguments for "main":

When the function "main" is called, it is passed two arguments. Up until this point in our tutorial, we have completely ignored these arguments since our programs work perfectly well without them. However, if you want to write a program which obtains information from the command line, you will want to specify these arguments in the first line of "main". Thus you would write

main(argc,argv) {

and continue on with the rest of "main".

The argument "argc" is a count of the number of distinct entities on the command line and the argument "argv" is a vector containing these entities. Thus if your command line was

go  infile  outfile

you would have

argc set to the value 3
argv[0] pointing to the string "go"
argv[1] pointing to the string "infile"
argv[2] pointing to the string "outfile"
argv[3] set to -1
   (the last element of "argv" is always -1)

Often this very simple-minded breakdown of arguments is all you need to obtain your information. For example,

in = open(argv[1],"r");
out = open(argv[2],"w");

opens "infile" for reading and "outfile" for writing. In this simple way, you have obtained two file names from the command line and opened them for use by your program. Naturally there are other ways to obtain information (e.g. prompting the user) but taking advantage of this ability to read the command line is a very neat approach to the problem.

The function which parses the command line and creates "argc" and "argv" is called .BSET. .BSET is also responsible for handling the I/O redirection constructs "<file" and ">file" when they appear on the command line. The capabilities of .BSET do not stop here. The function is also capable of classifying the kinds of arguments which appear in the command line. In the following section, we will discuss how .BSET performs this operation.

20.2 The Options Table:

The full capabilities of .BSET can be realized through the use of an options table. This is an external vector named ".optab" in which your program describes the keywords you want to recognize on the command line. ".optab" should consist of a list of two-word elements terminated by a word containing -1. Each two-word entry should consist of a pointer to a string containing the keyword, and a constant specifying the form in which the keyword may appear (in other words, this constant tells whether the keyword will appear in an expression like "-keyword", "+keyword", "keyword=value", and so on). The file "b/manif/.bset" contains manifest constant definitions for the constants that specify keyword form. Below we list some of the most commonly used manifests from this file.

This indicates arguments of the form -<keyword>.
This indicates arguments of the form +<keyword>.
This indicates arguments of the form <keyword>.
This indicates arguments of the form <keyword>=<number>. If <number> begins with a leading zero, it is taken to be an octal constant; otherwise, it is taken to be decimal.
This indicates arguments of the form <keyword>=<string>, where <string> is enclosed in either single or double quotes.
This indicates a "multiple numeric value" keyword. This type of keyword can have either of the forms
<keyword>="<number> <number> <number> ..."

For example,

<keyword>="1 2 3 4"

are two examples of the same multiple values. As with NVAL_KWD keywords, the numbers are taken to be octal if they have leading zeroes and decimal otherwise.

Below we give the options table for a simple program that reads in a file and writes it out again, expanding tab characters in the process. Once we have explained the options table, we will give the actual program.

.optab[] {
    "Tabs",       MNVL_KWD,
    "Fillchar",   SVAL_KWD,
    "LineLength", NVAL_KWD,
    "Info",       PLUS_KWD,
    -1        /* end of list */

The "tabs" option specifies the positions at which tab stops should be placed. If the tabs are not specified explicitly, they are placed every ten columns along the line. The "fillchar" option specifies what character should be used to pad out tabs; if not specified, blanks will be used. The "linelength" option specifies the maximum length of any line, once the tabs have been expanded; if expansion of tabs causes any lines to become too long, the program will print a warning message. If "linelength" is not specified, the maximum is set to 100. Finally, the "info" option tells the program to print out information on how many lines were read and how many tabs were expanded. This information is printed on the terminal.

Note that the keyword strings above have some characters in lower case and some in upper case. This is done in order to allow the user to abbreviate some of the keywords. In any valid abbreviation of a keyword, the upper case letters must be included, while the lower case letters may be included if the user wants. Thus "LineLength" could be validly abbreviated as


and many other combinations. Any or all of the lower case letters may be omitted. None of the upper case letters may be. (Note that it is the programmer's responsibility to ensure that the minimal abbreviations are not ambiguous; .BSET does not check to see if the same abbreviation is valid for two different keywords.)

Below we show a few examples of command lines which could be used to invoke the tabs program.

go:tabs +info ll=80
This command line specifies that the maximum line length should be 80 characters, and that information about tab expansion should be printed when the program is finished.
go:tabs fill="*" tabs=5,10,15,20,25,30
This command specifies that the fill character should be the asterisk and that tab stops should be set in the given columns.
go:tabs +info <infile >outfile
This command specifies that information about tab expansion should be printed when the program is finished. It also specifies an input file containing the input text and an output file which will receive the text with the tab characters expanded. .BSET will automatically set the current input unit to "infile" and the current output unit to "outfile" when it recognizes the "<" and ">" characters; these arguments will not be entered in "argv" since .BSET already knows how to take care of them. This would be the usual way of calling the "tabs" program; without the I/O redirection constructs on the command line, the program would take input from the terminal and type it right back out again, a somewhat pointless procedure.

The options table tells .BSET what to look for. We will now discuss how .BSET tells your program what it's found.

As we stated before, the entries of the "argv" vector contain pointers to strings that contain the information which was obtained from the command line. However, addresses in B only require half a word; thus each entry in "argv" has 18 free bits which are not used for the string pointer. .BSET uses this upper half of each word to indicate which keyword it has identified. For example, if it identifies the keyword "tabs", it sets a 1 in the upper half of the corresponding "argv" entry, since "tabs" is the first entry in ".optab". Similarly if .BSET finds a "fillchar" it puts a 2 in the upper half of the corresponding "argv" entry, and so on for every entry in ".optab". The upper half of the "argv" entry gets the number of the entry in ".optab" which contains the recognized keyword.

We have said that the bottom half of each "argv" entry points to a location in memory which contains information obtained from the command line. With no options table, this information was simply stored in ASCII string format, with a separate string for each entity on the command line. However, an options table tells .BSET what sort of information will be associated with each keyword, and thus .BSET can do more processing on each separate entity.

If an argument is of the form "<keyword>=<number>" (i.e. an NVAL_KWD keyword), the lower half of the corresponding "argv" entry points to a machine word containing the actual numeric value of <number>. Thus if the option was "ll=80", the "argv" entry would point to a word that contained the integer 80; without the option table entry, .BSET would not know how to break the option down, and would therefore have to satisfy itself with an "argv" entry that pointed towards the string "ll=80". The user himself would have to do the work of analyzing this string and plucking out the required number, a more tedious process than letting .BSET do the work for you.

If an argument is of the form "<keyword>=<string>" (i.e. an SVAL_KWD keyword), the pointer in "argv" is to the <string> itself.

For a multiple numeric value keyword, the "argv" entry points towards a vector. The zero'th word of the vector contains the maximum index of the vector; the rest of the vector contains the individual numbers that were specified as the multiple values for the keyword. For example, if the command line contained


the "argv" entry would point to a four-word vector whose zero'th word was 3 (since there are three values specified) and whose remaining words contained the numbers 10, 20, and 30 respectively.

20.3 The Tab Expansion Program:

This is the last programming example we will give in this tutorial. It is a good exercise to go through this program and make sure you understand every line; not only is this good practice for reading other people's programs, but it should help you to write your own programs in future.

We remind you that this is only an example. If you were actually going to write a B program to expand tabs, you would likely use the TABSET library function.

We will not explain our program in any great detail. We have already specified in the last section how the program is supposed to work. We might point out that the program does little real error checking; an actual tab expansion program would have to check to see if the tab stops specified on the command line were in the correct order, for example. The only real note we will make concerns the second line of the program. A line of the form


tells the compiler to include the contents of the file "filename" when it compiles the program. In the example below, the "inclusion file" contains a list of manifest constant definitions. This is a very common use of the concept of inclusion, since manifests frequently can be used by more than one program. If the manifest definitions are kept in a separate file, all the programs which need them can reference them using "%" for inclusion.

And now we present the "tabs" program.

/* define manifests for option table */
/* define options table */
.optab[] {
    "Tabs",       MNVL_KWD,
    "Fillchar",   SVAL_KWD,
    "LineLength", NVAL_KWD,
    "Info",       PLUS_KWD,
    -1        /* end of list */
 * define our own manifests based on the
 * option table
TABS = 1;
FILL = 2;
LLEN = 3;
INFO = 4;
main(argc,argv) {
    extrn deftabs,fchar,linlen,info_sw;
    extrn outlen,tabseen,tabsexp,lines;
    auto i,c;
    auto type,   /* type of command line keyword    */
         ptr;    /* pointer to command line keyword */
     * obtain the options from argv,
     * skipping argv[0] since that is just
     * the "go:tabs" string.
    i = 0;
    while (++i < argc) {
        ptr = argv[i] & 0777777;
        type = argv[i] >> 18;
        switch (type) {
          case TABS:
               * redirect deftabs to new
               * tab table
              deftabs = ptr;
          case FILL:
               * change fill character
              fchar = char(ptr,0);
          case LLEN:
               * change default line length
              linlen = ptr[0];
          case INFO:
               * set info switch
              info_sw = 1;
              /* anything else is wrong */
              printf("TABS: Unrecognized keyword: %s.*n",
     * we are now ready for action
    outlen = 0;
    while ( (c = getchar()) != '*0')
         * '*0' indicates end-of-file
        switch (c) {
          case '*n':
               * end of line
              if (outlen > linlen) warn();
              outlen = 0;
          case '*t':
               * tab
               * any other character
    if (info_sw) {
         * when +info is required
        printf(-4,"Number of lines: %d.*n",
        printf(-4,"Number of tabs read: %d.*n",
        printf(-4,"Number of tabs expanded: %d.*n",
warn() {
     * issues warning when a line is too long
    extrn lines,outlen;
    printf(-4,"Line %d of output file ",lines);
    printf(-4,"contains %d characters.*n",outlen);
exptab() {
     * expands tab characters
    extrn outlen,deftabs,tabseen,tabsexp;
    extrn fchar,lines;
    auto i;
    i = 1;
    while (outlen >= deftabs[i])
        if (++i > deftabs[0]) {
            printf(-4,"Tab not expanded in line %d.*n",
    while (outlen < deftabs[i]) {
outlen {0};  /* length of output line */
tabseen {0}; tabsexp {0}; lines {1};
deftabs[]   /* default tab stops */
fchar {' '};  /* default fill character */
linlen {100};  /* default line maximum */
info_sw {0};

21. BOFF

Programmers know that newly written programs very seldom work the first time they are run. If errors occur during compilation, the B compiler can issue a number of diagnostic messages, most of which appear in Appendix B. Once you have cleaned out compilation errors, it is still quite possible that there will be errors during execution of the program. When run, your program could do nothing, it could go into an infinite loop, or it could abort with a memory fault or some other hardware-detected error.

The BOFF subsystem is a very useful tool for detecting errors in B programs. BOFF stands for "B Obscure Feature Finder". You can use BOFF to inspect and/or patch the core image file of your program as prepared by the loader, to monitor the progress of your program as it is running, or to inspect a TSS dump file called "abrt" after your program has failed.

In this section, we will give a very brief description of some of the most basic features of BOFF. Those who wish fuller details on the capabilities of the language should see the BOFF Manual and the appropriate explain files.

21.1 Examining Dumps:

Often, when you try to run a program for the first time it will abort. An abort file containing the dump of the program will be put in your AFT. To look at your dump, type

boff abort s=load

where "load" is the name of the original load module for your program (normally ".h"). The "s=load" tells BOFF where to find the debug tables for your program; although these will be present in your original load module "load", the tables themselves are usually released after a program has been loaded into core, and thus they will not be available in the abort dump.

BOFF will reply with the fault type and the value of the instruction counter at the time of the fault. If debug tables are available in your program, BOFF will also give the location where the program aborted, expressed as an offset from the load table name whose address is closest to and less than the abort location.

In BOFF, all numbers are displayed in octal, but when examining the contents of memory locations you have the option of changing the form of this display. When BOFF reads a number from the terminal, the number must be an integer. If the integer has a leading zero, it is taken as an octal constant; otherwise, it is taken as a decimal constant. Thus, every time you wish to type in an octal number, you must remember to add the leading zero.

Once you get an abort dump, you will likely want to get a traceback of function calls in reverse order. Type


after the "boff abort" system command. The traceback will show the stack pointer used during each function call, the function name, the arguments, and the symbolic address from which it was called.

When looking at a traceback, there are a couple of things you should keep in mind. You may see function names in the upper part which are not part of your program. This means your program aborted in the middle of a library function. The arguments displayed for a library function may not look the way you expect; this could happen because your program passed bad arguments, but it is also common that library functions modify their arguments or use them for different purposes.

If you are lucky, your problem will be a bad argument and a solution will suggest itself. More likely, you will want to inspect the state of the various variables in the program. You can look at any external value at any time, but you can only look at the arguments and auto variables of the current function, which is initially the topmost function in the traceback. BOFF provides two commands which let you move the current function context either up or down in the traceback list. For example, suppose you have the following traceback.

00221 open (0150)    rets to 003306 (&readf +0410)
00151 readf (01162,0777777777766) rets to 001160 (&func+010)
00147 func (0777777777766) rets to 001140 (&main +06)
00145 main (056150000132)    rets to 001321 (...... +0114)

The program has died in the library routine OPEN, which was called from another library function, READF. In fact, the cause of this particular abort was a bad argument to READF. To begin with, BOFF will be examining the arguments and variables of OPEN. Since OPEN is a library routine, it probably has no debug tables. To go down the stack and examine the arguments and variables of "main", you can tell BOFF to move down three functions with the command


From there, to look at the local variables of FUNC, you would tell BOFF to move up one function by typing


As you can see, context positioning on the stack is specified by giving the number of levels to move, followed by a colon, followed by the direction which is either 'u' or 'd'.

You might wonder what happens if you tell BOFF to move too far up or down. In this case, BOFF will leave you in a place which it considers reasonable. To find out where, type


which will show the name of the function whose context is current. If there is a debug table for the function, BOFF will also print the names and values of the arguments, the auto variables, and the labels of the function. "va" means "View Autos".

21.2 Examining Memory Locations:

To display memory locations in BOFF, you can use a command with the following format.


where the square brackets indicate that an item is optional.

The "addr" must be an Lvalue, which means that it must be in either the form "name" or the form "*nnn", where "nnn" is a number. If you do not supply an "addr", BOFF uses the last address displayed plus one.

"nwds" is usually a number, and gives the number of words of memory to be displayed. If no number is specified, the default is one word.

The "mode" is a possibly null string of characters, which can include

a - address format
b - BCD constant
c - ASCII character constant
d - decimal
f - floating point
i - interpret as machine instruction
o - octal
s - display B string 

If the string is null, the mode is octal. If the character '\' is not available on your terminal, you can use '"' instead. If "mode" is not specified in a display command, the most recent mode is used. The initial mode is octal.

To display the contents of a variable called "x", you might type


To print its contents in octal, BCD, and ASCII, you would say


To display the contents of the eleven word vector "y", you would enter


To examine the vector in octal and address format, you would type


To examine the first 20 words of the function "main", you would say


The Lvalue of the expression "main" is the address of the routine "main", which is where BOFF starts its display. To display the next 20 words, enter one of the following two alternatives.


Since no "mode" is given, the most recently specified display mode is used.

To display 20 words, starting from location 110 (octal), you would type


When you go on to read the full BOFF documentation, you will find that you can construct arbitrary expressions composed of names, integer numbers, and a subset of the B language operators.

21.3 Tracing a Program:

At this point, we have covered the basic information you need in order to use BOFF on an abort file. It is more likely that you will want to use BOFF on your program as it is running. Not only can you use everything mentioned so far while the program is running, but you can also set breakpoints and execute function calls from BOFF.

To run your program with BOFF, give the command

boff run "program arg arg ..."

where the "args" are the arguments you would normally pass to your program when using the GO command. For example,

boff run "tabs +info <tabfile tabs=5,15"

could be used to run our tab expansion program under BOFF. It will take BOFF a few seconds to get things set up, after which it will set a breakpoint at "main" so that you will get control in BOFF just before "main" is executed.

When you get control, the current function under examination by the ":va" command will be "main", the function at which the breakpoint is placed. With our tab expansion program, ":va" will print

i        -      3A: 0145170160154
c        -      4A: 057056056057
argc     -      1A: 03
argv     -      2A: 03041623
ptr      -      6A: 0164157152057
type     -      5A: 0142057164165
#temp    -      7A: 0141160160143

As is apparent, everything but "argc" and "argv" contain garbage when you first enter "main". "argc" has the number of arguments, in this case 3, and "argv" points to the argument vector.

When running a program like this with BOFF, the commands for moving up or down the stack, for examining memory, and for printing a traceback work as described in previous sections.

BOFF makes it very convenient to put a breakpoint on a function. If your program consists of a number of small functions rather than one monolithic "main" function, it will probably be easier to debug.

To put a breakpoint on a function, use the ".bf" command as follows.


where "name" is the name of the function (e.g. "exptab:bf"). To delete such a breakpoint, use the command


To continue execution after pausing at a breakpoint, you can use the command


You may also want to find out what value a function returns. You can set a function return breakpoint by typing the command


This will result in a break at the moment the current function returns. Note that you cannot cause a break at the return of any other function but the current one, even though you can cause a break at the start of any function.

Finally, BOFF allows you to call any function defined in your program and to supply it with arguments. For instance, you might say

printf("%c %c*n", 'abcd', c )

This example shows that BOFF can not only read numbers and variable names, but can also read B string and character constants. BOFF will always print out the value returned by the called function. If the callee does not explicitly return a value, BOFF may still print a returned value, but this is essentially arbitrary and can be disregarded.

Appendix A: Binding Strength of Operators

Operators are listed from highest to lowest binding strength; there is no order within groups. Operators of equal strength bind as indicated, either left to right (denoted by [LR]) or right to left (denoted by [RL]).

   [LR]   name const primary[expr] primary(arglist) (expr)
   [RL]   ++ -- * & - ! ~ #- # ## (unary)
   [LR]   >> <<
   [LR]   &
   [LR]   ^
   [LR]   |
   [LR]   * / % #* #/ (binary)
   [LR]   + - #- #+
   [LR]   == != > < <= >= #== #!= #> #< #<= #>=
   [LR]   &&
   [LR]   ||
   [RL]   ?:
   [RL]   =  +=  -=  etc.  (all assignment operators)

Appendix B: B Compiler Error Messages

This is a list of diagnostics known to be generated by the B compiler. There may be others.

In each description, "nn" means a line number, while "name" is some identifier name. The name of the source file is usually given as well.

Any message not preceded by "warning: " is a fatal error. If there is a fatal error, neither the loader nor the random library editor will be called.


syntax error at line nn [in file <name>]
This is the most common diagnostic and it could mean almost any kind of error. Most often, it means that a semi-colon is missing or that the number of opening curly braces "{" does not match the number of closing curly braces "}". In the second case, the line number will be one more than the number of the last line in the last file being processed. This error may also occur if you neglect to end a string constant, character constant, or comment. You also get this message if you use a keyword in an inappropriate context, if you neglect to define a manifest, or if you attempt to redefine a manifest.
<identifier> undefined in function <name>
An identifier in the named function has not been referenced in an extrn or auto statement and has not been used as a label. The line number given is the last line of the function being compiled.
warning: /* inside comment ...
This is a warning only, but there will probably be a syntax error later on, since comments may not be nested. After reading a "/*", the compiler skips all text until a "*/" is encountered. If there is a comment inside a comment, the compiler will attempt to compile the remainder of the outer comment.
end of file in comment
This usually indicates that you forgot to end a comment with the terminating "*/".
warning: newline in constant not preceded by "*"
The most likely cause of this error is forgetting to terminate a string or character constant with the appropriate delimiter. If this is the case, you will surely get a syntax error later. If you want a "real" new-line inside the constant, but no warning, use the escape sequence '*n'. If the constant is a string constant which is too long to fit on one line, precede the new-line with a '*'; the new-line will be discarded. When the warning is issued, the new-line is kept.
invalid octal constant
An integer beginning with the digit zero, which is thus assumed to be an octal constant, contains a character other than the digits zero through seven.
character constant too long
A character constant may not contain more than four characters, although each character may be a two-character escape sequence.
bcd constant too long
A BCD constant contains more than six characters.
exponent too large in constant
The exponent of a floating point constant is too large or too small to represent in the hardware.
attempt zero division
In evaluating the constant part of an expression, the right operand of a division or remainder operator was found to be the constant zero.
invalid & prefix
The "&" operator has been used in an invalid context, such as "&x = y".
warning: found ++r-value, warning: found --r-value
You get this if you say something like "++x++".
invalid $ escape sequence
An escape sequence beginning with '$' is not known to the compiler.
invalid unary operator
The compiler discovered you trying to use a binary operator in a unary manner.
invalid =, invalid *=, invalid >>=
The expression on the left hand side of an assignment operator does not have an Lvalue. (The term "Lvalue" is explained in the B Reference Manual.)
invalid ++, invalid --
The expression operated upon by the "++" or "--" operator does not have an Lvalue.
invalid label
A name used as a label has previously been declared as external or auto in the current function.
invalid break
The compiler found a break statement which was not inside a for, while, do-while, repeat, or switch statement.
invalid next
The compiler found a next statement which was not inside a for, while, do-while, or repeat statement.
invalid constant expression
This will happen if you try to use a string constant in a constant expression.
invalid operator
This is one of those "cannot happen" messages. If it does happen, please submit an error report.
auto array too large
You attempted to declare an auto vector with a dimension greater than 1000 words. It is better to use an external vector or else GETVEC the space, since auto variables are allocated on the stack and stack space is limited.
extrn array too large
This will happen if you declare an external vector like "x[3.0];". B will treat the internal representation of the floating point as an integer, with unfortunate results.
invalid case
A case label is not inside a switch statement.
invalid default
A default label is not inside a switch statement.
default already supplied
A switch statement has more than one default label.
invalid case operator
When an upper or lower bound is specified in a case heading, the only valid operators are <, >, >=, and <=.
%filename ignored- too many open files
This usually happens when you include a file which includes itself.
bad input character: <ddd> (octal)
A character encountered in the input stream outside a string or character constant has no meaning for the compiler. This might be a backspace or some control character typed in by mistake. Since it may be a non-printing character, the value of the offending character is displayed in octal.
rewrite this expression
A subscripting expression is too involved for the code generator to handle. Try breaking up the expression into more than one statement.
manifest nesting too deep
This will occur when you have manifest constants whose evaluation involves other manifest constants. The error happens if you have too long a series of manifest definitions, each of which is defined in terms of the previous manifest. This is all right in GMAP, but not in B.
warning: program size > 32k
One of the object decks generated will require more than 32K words to load. You may get this warning if you declare several very large external vectors. However, it might also mean the loader will be aborted by TSS due to "not enough core to run job".
expression too complex, no tree space, no stack space
An expression is too complex for the compiler to evaluate. Try simplifying it by breaking it up into two or more expressions.
The constant <ddd> occurs in two case labels
The same constant appears in more than one case heading in a switch statement. The value of the offending constant is printed in decimal.
the upper range <ddd> overlaps the lower range <ddd>
The compiler has detected overlapping bounds inside a switch statement. The values of the bounds are displayed in decimal.
The constant <ddd> is in the range <ddd>::<ddd>
The compiler has detected a case constant which is in the range of a range case or relational case. The numbers are given in decimal. If something conflicts with a relational case, the bounds generated for the relation are shown. For example, the bounds for "case > 0:" would be "1::34359738367".
Initializers nested too deeply
An external declaration has initializers in braces nested to a depth greater than seven.
external redefined, auto variable redefined, label redefined, auto array name redefined
The compiler has detected an attempt to redefine a symbol which has already been defined in the current function body.
no space for symdef
There are too many external definitions; try dividing them into two groups either by compiling them separately or by placing a function in between. This error is almost never encountered.
no space for symref
There are too many external references in a function definition; try simplification. This error is almost never encountered.
warning: #<text> ignored
A line beginning with a '#' does not contain a recognizable compiler directive. The line is ignored.

TSS Loader Warning Messages:

<w> name undefined
This loader message indicates that an external variable referenced by one of your functions or a library function remains undefined after all libraries have been searched. If your program references the named external, it will abort with a MME fault in TSS, or with a USER'S L1 MME GEBORT in batch.
<w> name loaded previously
The loader has discovered a function or external with the same name as one already loaded. The most probable reason is that you have two or more different names which end up being the same when truncated to six characters. The loader ignores all but the first occurrence of any such name. Make sure all your externals and function names are unique in their first six characters.

Appendix C: Escape Sequences

There are two sets of escape sequences, one for use inside string or character constants and the other for use outside.

Appendix D: Partial Index of B Library Routines

The following is a list of some of the routines currently in the B library. The list is by no means complete; we have simply chosen some of the more commonly used B library functions so that the reader can get an idea of what the B library can do.

check for valid abbreviations
print an error message and abort
parse string into arguments
reference or change the current read unit
set tabs for the current output unit
reference or change the current write unit
absolute value of an integer
a simple garbage collecting storage allocator
check if a character appears in a string
call arbitrary function with arbitrary arguments
convert an ASCII string to a BCD vector
call FORTRAN program from B routine
call FORTRAN function & return floating-point value
close currently open unit
compare two B strings
concatenate a series of strings
copy contents of one vector into another
return current date in ASCII
convert date in ASCII string to a standard form
test input end of file, or write output end of file
compare two strings for equality
type an error message, then exit
end job and return status
useful externals in the B library
read a character
turn ASCII date into the form mm/dd/yy
read a line from an input unit
read a string from an input unit
get userid of current user
dynamically allocate a vector
the B histogram package
return the length of a string
return the current line number of a file
turn alphabetics in a string to lower case
maximum value of list of integers
minimum of a list of integers
return number of arguments to a function
count number of times break key was hit
check for null string
open a file or string for I/O
do a PRINTF into a string
formatted print
prompt for input at terminal
output a character
generate pseudo-random numbers
unit-oriented block I/O
formatted character stream input
non-local goto
release storage used by a vector
convert ASCII pathname to BCD catfile stack
extract delimited substring of a string
a shell sort
convert standard date to ASCII string
wait for specified interval
cause line numbers to be stripped on input
execute a TSS command
specify tab stops for the current output unit
get time in pulses, or convert it to a string
trim trailing blanks off a string
initialize a B vector to some value