A Computer-Aided Learning COBOL Package
This article describes a COBOL compiler and run-time interpreter for teaching COBOL in a computer-aided learning environment. The compiler converts COBOL source code into an intermediate pseudo code that is executed by the run-time interpreter. This package is the first implementation of COBOL written in BASIC for an unexpanded BBC Micro with a 32K memory.
Overview
COBOL is a high level computer programming language for implementing commercial applications. This report describes the work carried out to implement and test a COBOL compiler and run time interpreter for teaching COBOL in a computer-aided learning environment. The compiler converts COBOL source code into an intermediate pseudo code that is executed by the run time interpreter. Both the compiler and run time interpreter are written in BASIC and run on an unexpanded BBC Micro model B. This report describes the first implementation of COBOL for an unexpanded BBC Micro with a 32K memory.
Contents
- 1. Introduction
- 2. The COBOL Language of this Implementation
- 3. Using the Package
- 4. Design
- 5. Testing and Evaluation
- 6. Conclusions and Further Work
1. Introduction
A meeting at the University of Pennsylvania Computing Centre, Philadelphia in April, 1959 was held to consider the desirability and feasibility of establishing a common programming language for implementing business applications. This meeting concluded that:
- developing and maintaining business applications would be easier with a common business language;
- existing applications could be transferred to more powerful computers with smaller conversion costs;
- documentation could be quickly amended and extended to meet new management requirements;
- programs in a common language should be self-documenting enabling relatively inexperienced staff to read and write programs—important when a large suite of programs must be maintained by current staff; and
- staff would be trained more quickly.
A May, 1959 meeting at the Pentagon formed the Conference on Data Systems Languages (CODASYL), the organization that would produce the common language. In 1960, the US Department of Defence—at the time the largest users of computers—produced an initial specification of the Common Business Oriented Language (COBOL) giving birth to the world’s most popular commercial programming language to date.
COBOL has two features that make it particularly suitable for implementing business applications. First, COBOL is not complicated: programs are written in a form nearer to English than other high level languages. This ease of programming means that it’s easy for relatively inexperienced programmers and business users to learn. Second, records are the basis of all commercial programs be they master records in a payroll or stock records in a factory stock control system, and COBOL has powerful facilities for processing and manipulating record data structures.
Since its introduction in the early 1960s, COBOL has steadily increased in popularity until reaching its peak when 95% of all commercial applications were written in the language. Although the number of commercial applications written in COBOL has decreased to 85%, the popularity of the language is set to remain high with 75% of commercial applications still being written in COBOL by the middle of the next century.
COBOL is in a privileged position because its popularity grew in the early days of commercial programming. Because COBOL is so popular and so widespread, firms request that their computer programs are written in COBOL. Firms are reluctant to use newer languages because there is a ready supply of experienced COBOL programmers and far fewer programmers with experience of other languages. COBOL’s popularity makes it difficult for new languages to break into the commercial market. However, the decrease in the number of commercial applications being written in COBOL shows that other languages are filtering through.
1.1 Aims and Objectives
The aims and objectives of this project are:
- to implement a limited version of COBOL for use by anyone wanting to learn a simple version of COBOL, specifically A level Computer Science students;
- to make COBOL easy to learn by using the basic structure of a standard COBOL program without requiring COBOL’s strict formatting rules that hinder users new to the language;
- to compile and run relatively simple COBOL programs using a compiler to produce a pseudo code that is saved on disk and executed by a run time interpreter;
- to make the package easy to use by selecting the compiler and run time interpreter programs with a menu;
- to enable users to view the pseudo code produced when a COBOL program is compiled; and
- to test the compiler and run time interpreter by running a simple payroll application written in COBOL.
1.2 Description of the Problem
The problem is to run COBOL on a standard, unexpanded BBC Micro Model B. It is possible to run COBOL on a BBC Micro but current implementations require an Acorn Second Processor which costs approximately £150. On top of this cost is the COBOL package itself which costs between £70 and £150. This high cost puts COBOL out of reach for the majority of BBC Micro users that want to learn and use the language.
2. The COBOL Language of this Implementation
This package is aimed at users learning COBOL from scratch or improving a basic knowledge of the language. This documentation contains only a basic description of COBOL and the package should be used with books or worksheets designed for teaching COBOL.
All versions of COBOL have basic elements in common that are compulsory in all programs. COBOL programs are divided into four divisions that are written in the following order:
- the IDENTIFICATION division documents the program;
- the ENVIRONMENT division lists the computers and peripherals required to compile and run the program;
- the DATA division specifies the data records processed by the program; and
- the PROCEDURE division contains the instructions that manipulate the data records.
In full implementations of COBOL, programs must be laid out according to strict formatting rules that specify the number of spaces at the beginning of each line. Because this implementation is for teaching COBOL, the strict formatting has been relaxed to just three simple rules:
- any number of spaces are allowed at the beginning of a line;
- each line must be terminated with a full stop and a carriage return; and
- no carriage returns between words or statements.
These simple formatting rules mean that users will spend less time correcting compilation errors caused by incorrect formatting and more time leaning about writing COBOL programs.
2.1 The Identification Division
The Identification division documents the program with it’s name, it’s author and any comments, which are called remarks in COBOL:
The statements marked with an asterisk (*) are compulsory in this implementation.
Because this is a CAL package, all the items in the Identification division are compulsory to help learners document and understand their programs. The Identification division is ignored by the compiler.
2.2 The Environment Division
The Environment division specifies the computer on which the program is to compile, the computer on which the object code program is to run and lists the peripherals needed to run the program:
In the Configuration section the Source Computer statement specifies the compiling computer and the Object Computer statement specifies the executing computer. In this implementation, the source and object computers are the same, i.e. the BBC Micro. In the File Control section of the Input Output section, the:
statement declares which files use which peripherals. There must be at least one SELECT...ASSIGN
statement that selects a disk file.
2.3 The Data Division
The Data division describes the format of the data records processed by the program:
The Data division is divided into the File section and the Working Storage section. The File section describes the files that are used for input and output. The Working Storage section contains the internal working data used to execute the statements in the Procedure division. The record used in this section follows the same format as the other records. The Working Storage section is compulsory because all programs require it.
The structure of the data records defined in the Data division depends on the application but they must follow a standard pattern:
The first line of a record is the file description (FD
) that specifies the external filename (MASTER
) of the record if it’s defined as a disk file. If the record is defined as a keyboard file, a filename must be supplied but it’s ignored by the compiler.
The LABEL RECORDS ARE OMITTED
clause has no effect in this implementation but is compulsory to make programs more like standard COBOL.
The remainder of the record is composed of the data fields. A data record can have up to 15 fields, and each field has four parts:
The level number specifies the position of the field in the record. Level numbers are explained in more detail below.
The field name is the name by which the field is referenced in the rest of the program. All field names and file descriptions must be unique within the same program because the compiler would be unable to differentiate fields and file descriptors with the same name.
The PIC keyword precedes a picture definition that specifies the length and data type of the field. Picture definitions are explained in more detail below.
The VALUE IS clause is optional and allows an initial value to be assigned to the field. If the VALUE IS clause is absent, alphabetic and alphanumeric fields will be null and numeric fields will be zero.
Field Level Numbers
Each field has a level number (01, 02, 03, etc.) that specifies the position of the field within the hierarchical record structure. A level number is a two digit integer and must start with zero if it’s less than ten. Fields at the same level in the hierarchy must have the same field number. Sub-fields must have a field number higher than the fields above them in the hierarchy.
Fifteen fields are available in this implementation so the maximum level number is 15 (if steps of one are used). More fields would be available if more memory was available.
Field level numbers determine how data is moved from one field to another. Data is moved between the following records:
in three ways:
- the instruction
MOVE AGE TO NEW_AGE
copies the value ofAGE
toSAME_AGE
; - the instruction
MOVE NAME TO SAME_NAME
copiesFORENAME
toSAME_FORENAME
andSURNAME
toSAME_SURNAME
; and - the instruction
MOVE NAME TO FULL_NAME
concatenatesFORENAME
andSURNAME
and copies the result toFULL_NAME
.
In standard COBOL, undivided fields in the working storage section have level number 77. In this implementation, all field level numbers follow the same pattern in each section. I felt a consistent field numbering system would make it easier to learn COBOL.
Picture Definitions
Each field is defined by a picture definition that specifies the number of characters the field can hold and the data type of those characters. This implementation of COBOL provides three data types: alphabetic, numeric and alphanumeric.
Alphabetic fields contain only alphabetic characters (the 26 letters of the alphabet). The number of A’s represents the number of alphabetic characters allowed in the field:
A(60)
AAAAAA
AA
Numeric fields contain only the digits 0 to 9, an optional sign, S, and an optional decimal point, V. The number of 9s represents the number of numeric characters allowed either side of the decimal point in the field:
Picture | Size | Numeric Range |
---|---|---|
9(4) | 6 | 0000 to 9999 |
S99999 | 6 | -99999 to +99999 |
S9999V99 | 5 | -9999.99 to +9999.99 |
Alphanumeric fields can contain alphabetic and numeric characters. The number of Xs represents the number of alphanumeric characters allowed in the field:
X(20)
XXXX
X
A string of two or more A, X or 9 characters can be replaced by a shorthand notation that specifies the number of characters in brackets following the character. For example, 9(6) is equivalent to 999999; and X(10) is equivalent to XXXXXXXXXX. However, it’s impossible to specify a decimal point using this shorthand when defining a numeric data type.
2.4 The Procedure Division
The Procedure division contains the instructions of the program that manipulate the data records specified in the Data division:
As with standard COBOL, all arithmetic statements such as Add and Multiply require that the fields they operate on are in the Working Storage section.
The COBOL statements available in the Procedure division are described using the following key:
Statement | Meaning |
---|---|
<var> | a variable |
<file> | the name of a file |
<word> | a sequence of characters |
<literal> | a quoted string |
<result> | the result of an operation which must be a <var> |
<num> | numeric value |
| | or |
[] | optional |
Data operating constructions:
Conditional constructions:
Filing constructions:
In a full implementation of COBOL, the AT END
clause can be followed by other COBOL clauses, such as:
In this implementation, however, the AT END
clause must be followed by a GO TO
clause:
2.5 Limitations of this Implementation
COBOL is normally run on mainframes and minicomputers. Implementing a version of COBOL for a microcomputer with a small 32K memory seems impossible. Several limitations had to be imposed to fit an implementation of COBOL into 32K, including omitting language features that are not required for novice COBOL users:
- No arrays. Although arrays are an important part of any programming language, they are not essential in a CAL package because users will be writing fairly simple programs.
- No algebraic expressions. Algebraic expressions such as
COMPUTE area=pi*r^2.
have not been implemented because of memory limitations. This is not too disadvantageous because algebraic expressions, like arrays, are not essential for a CAL package because expressions such asMULTIPLY HOURS BY RATE GIVING PAY.
are more common in COBOL and more instructive when learning the language. Furthermore, algebraic expressions are not generally used in business applications. - A limit of fifteen fields per data record.
Several other limitations were imposed:
- No random access files. The package can only create and process sequential files. Random access files are more complicated to implement than sequential files and there was insufficient time to include them in the package. Random access files can be added as an enhancement.
- A limit of four external disk files. This implementation supports only four external disk files because the Acorn Disk Filing System (DFS) used by the BBC Micro supports only four simultaneous disk files.
- No linkage or report section. The COBOL data division has up to four sections written in the following order:
- File section;
- Working Storage section;
- Linkage section;
- Report section.
Because the package is for teaching COBOL, only the File and Working Storage sections have been implemented. The Linkage and Report sections were not implemented to save time and memory and because the File and Working Storage sections are the most instructive of the four sections.
- A program has a maximum of 20 literals and each literal can be no longer than 30 characters;
- A program has a maximum of 20 labels and each label can be no longer than 15 characters;
- A line of code can be no longer than 15 words and each word can be no longer than 15 characters;
- Filenames can be no longer than 7 characters;
- Records can be no longer than 80 characters; and
- A compiled program can be no longer than 168 bytes of pseudo code.
3. Using the Package
The package is divided into three programs: the main menu, the compiler and the run time interpreter. The main menu program enables the user to select the compiler or the run time interpreter or to exit the package. The compiler converts the user’s COBOL program into an intermediate pseudo code which is executed by the run time interpreter.
3.1 Hardware and Software Requirements
Although the package is a self contained suite of programs, it relies on some standard items of hardware and software.
The package requires the following hardware:
- a standard BBC Micro model B;
- a monitor or display unit;
- a single 40 track disk drive; and
- an optional printer.
The package also requires the following software:
- a BBC Basic interpreter;
- a text editor or word processor that produces pure ASCII files such as Wordwise+ or an Acornsoft ISO Pascal text editor; and
- a standard Acorn Disk Filing System (DFS).
3.2 Source and Object File Conventions
The package uses the following convention for storing files:
- source code files must be stored in the “C” directory; and
- pseudo code files are stored in the “P” directory.
This convention enables files to be easily recognised on disk and enables the same file name to be used for the pseudo code file as for the source code file.
The compiler expects all source code files to be stored in the “C” directory. The compiler stores the pseudo code file in the “P” directory within which the run time interpreter expects to find all pseudo code files.
For information about disk directories refer to the Disk Filing System (DFS) manual.
3.3 The Main Menu
When the main menu program has loaded, the user is presented with three options:
- Compiler
- Run Time Interpreter
- Quit
Option 1 loads the compiler. Option 2 loads the run time interpreter. Option 3 allows the user to leave the package; all loose ends are tied up and the user is returned to the familiar prompt of the BASIC environment:
BASIC >_
To select an option, press the key of the number that corresponds to the menu option (the RETURN key is not required).
If the compiler’s data file (CDATA) or the run time interpreter’s data file (RTIDATA) is not on the disk the menu will display an error and won’t let the user continue. At this point the data files must be put on the disk and the package restarted.
3.4 The Compiler
When selected from the menu, the compiler loads and displays the following message on the screen:
INITIALIZING...
while it loads its data file. This is a short process and the user need do nothing except watch. When the compiler has finished loading, the user must configure it by providing the following information:
- the name of the disk file that contains the source code of the COBOL program;
- the name of the disk file in which the object code will be stored;
- whether to pause after a compilation error or keep on compiling;
- whether to show the pseudo code generated by the compiler after each line is compiled; and
- whether to output the compilation to a printer.
Compiling a Program
As a COBOL program is compiled, each line of code is displayed as it is read from the source text file:
CAL COBOL Compiler compiling... IDENTIFICATION DIVISION. PROGRAM_ID EXAMPLE. AUTHOR JEFFREY MORGAN. DATE_WRITTEN MARCH_1989. ...
If selected, the pseudo code is displayed after each line of COBOL:
CAL COBOL Compiler compiling... IDENTIFICATION DIVISION. 1 33 PROGRAM_ID EXAMPLE. 22 -1 AUTHOR JEFFREY MORGAN. 23 -1 -1 DATE_WRITTEN MARCH_1989. 24 -1 ...
After compilation, the file name of the program, the number of compilation errors and the number of bytes of pseudo code are displayed:
------------------------------------------------------- Compilation of <C.EXAMPLE> complete. 6 compilation error(s). Pcode is 76(+) bytes long. -------------------------------------------------------
The plus in brackets reminds the user that although the pseudo code is 76 bytes long, the symbol table and the literals are also saved in the pseudo code file.
When printing the compilation of a COBOL program, the print out is preceded by the following header:
======================================================= = CAL COBOL Compiler = = = = Compilation of file <C.EXAMPLE> = = = = Documentation header = =======================================================
The header remarks—here “Documentation header”—are specified by the user.
After compiling a program, the user will be returned to the main menu. Alternatively, pressing the ESCAPE or BREAK keys when using the compiler will return the user to the main menu.
Compilation Error Messages
The compiler produces an error message whenever one is required:
- Syntax
- No such variable
- Types incompatible
- Label exists
- No such label
- Variable already defined
- Bad picture
- Too many files
- Undefined file
- No such file
- Not in working storage
- Line too long
- Number too long
- Undeclared file
- Bad level
- Invalid level
- Invalid filename
- No quotes (missing quotes)
- Missing label
- Too many fields
- Too many labels
- Too many literals
- No pcode space left
- Full stop expected
- Word to long
- Record too long
- Literal too long
Errors 20 to 23 are fatal and the compilation will stop when one of them occurs; otherwise, the compilation will continue until the end of the program.
The compiler does its best to continue when a compilation error occurs. However, as with most compilers, when an error does occur, several further errors may be caused by the first error. For example, if a Bad Picture error occurs, the field won’t become part of the record and when the field is referenced later in the program a No Such Variable error will occur. It’s best to debug each error as it occurs in the program because subsequent errors reported by the compiler often disappear when the first error in a program is fixed.
Error messages are displayed inside chevrons:
>>> SYNTAX ERROR <<<
In the unlikely event of an error occurring in either the compiler or the run time interpreter, the error is displayed in the following format:
Sorry, can't continue. An error has occurred in the Compiler/Run time interpreter itself. Disk changed at line 3620. >_
If this happens, the user should reload the package and start again.
3.5 The Run Time Interpreter
When selected from the menu, the run time interpreter loads and displays the following message on the screen:
INITIALISING...
while it loads its data file. This is a short process and the user need do nothing except watch. When the run time interpreter has finished loading, the user must configure it by providing the following information:
- the name of the file containing the compiled COBOL program;
- whether to displayed the pseudo code on the screen while the program is running;
- whether to print the execution.
Executing a program
The run time interpreter displays the result of each Display statement on the screen:
CAL COBOL Run Time Interpreter running... PROCESSING DATA FRED SMITH NEW PAY = 362.23 JIM PETERS NEW PAY = 647.57
If selected, the run time interpreter also displays the pseudo code as it is read from the pseudo code file:
running... PROCESSING DATA 14 1 -1 14 -1 FRED SMITH 14 102 -1 NEW PAY = 362.23 14 2 502 -1 14 -1 JIM PETERS 14 102 -1 NEW PAY = 647.57 14 2 502 -1 14 -1
Whenever a COBOL program requires input, the run time interpreter prompts the user:
>>_
The user should enter the required information and press the RETURN key.
When printing the execution of a COBOL program, the print out is preceded by the following header:
======================================================= = CAL COBOL Run Time Interpreter = = = = Run of file <C.EXAMPLE> = = = = Documentation header = =======================================================
The header remarks—here “Documentation header”—are specified by the user.
After running a program, the user has the option of either running another program or returning to the main menu. Alternatively, pressing the ESCAPE or BREAK keys when using the run time interpreter will return the user to the main menu.
Run Time Error Messages
The run time interpreter produces an error message whenever one is required:
- Number too big
- End of file
- File closed
- File already open
- File write only
- File read only
- No such file
- File can’t extend
- Disk is full
- File is locked
- Catalogue full
- Disk is read only
- Division by zero
If an error occurs when running a program, the run time interpreter displays the error, halts the execution of the program and returns the user to the main menu.
4. Design
4.1 Description of the Problem
The programming problem is to input a stream of characters from a text file on disk containing a COBOL program, group the characters into the words of a line of COBOL, check the syntax of the line, add the information in the line to the symbol table and then convert the line into pseudo code. After the pseudo code and symbol table have been constructed they must be saved in a disk file, known throughout the package and this documentation as a pseudo code file.
The compiler must analyse COBOL programs as fast as possible to avoid lengthy compilation times, and must produce pseudo code that is as simple as possible to leave only the task of executing the program to the run time interpreter which, in turn, must execute the pseudo code as quickly as possible. Both the compiler and the run time interpreter must manage memory efficiently.
4.2 Design and Implementation Choices
Several key decisions were made before implementing the package:
- To use a menu rather than a command line to access the compiler and run time interpreter programs. Menus are easier to use because the user is limited to a few choices and doesn’t need to learn new operating commands to use the package on top of the COBOL language. By using a menu with just a few options, the user can move simply and quickly between programs and the number of user errors is reduced to zero.
- To implement the package as a compiler and run time interpreter rather than an interpreter. The main constraint of the package is the limited 32K memory of a standard BBC Microcomputer. To produce an interpreter in its own right would be practically impossible. If it were attempted then some form of virtual programming using overlays would be required. Implementing overlays would be complicated, time consuming and would overshadow the task of developing a COBOL package. The only practical approach was to divide the task into two distinct stages, each implemented by a separate program: compilation and run time execution of the object code.
- To use a pseudo code instead of producing pure machine code. Producing pure machine code would be beyond the scope of this project. An advantage of pseudo code is that only the run time interpreter would need to be modified to run compiled COBOL programs on another computer.
- To use a segmented isolation development technique. When designing and writing the package, each new procedure and function was written in isolation and tested with a small program rather than in the main program. Each new procedure and function was therefore bug-free when inserted into the main programs. Writing them in isolation reduced development time because only very small programs had to be tested.
- To use BASIC instead of another language. Two programming languages were available to implement the package: BASIC and Pascal. The package was originally to be written in Pascal but after a preliminary investigation of Pascal I decided to use BASIC for three reasons. First, strings are used throughout the package but Pascal has no string data type; BASIC has powerful string handling facilities. Second, Pascal doesn’t provide random access files so an indexed sequential filing system could not be implemented. Although the package in its present form does not offer them, indexed sequential files could be added as an enhancement in BASIC. Third, BASIC is an interpreted language so new procedures and functions could be tested and run immediately without a lengthy compilation stage.
- To not implement noise words. Standard COBOL allows users to add noise words that are ignored by the compiler to help make COBOL statements even more like English sentences than they already are. An example of a noise word is the word TO that could be added to the SELECT…ASSIGN statement in the Environment division:
SELECT <file> ASSIGN TO <disk> | <keyboard>
Noise words are not essential for learning COBOL and allowing optional words might be confusing for novice users who might be confused about which words are optional. Noise words can be learned later when the user is more familiar with COBOL.
4.3 Representation of Records
The data structure that holds the data records is an important part of the package because the compiler uses it to construct the symbol table, an internal representation used to compile the statements in the Procedure division that is also used by the run time interpreter to execute the program.
The following COBOL data record:
would be represented inside the compiler in the following table:
Field Number | Field Name | Length | Type |
---|---|---|---|
1 | MASTER | ¦29 | |
2 | NAME | ¦34 | |
3 | SURNAME | 10 | X |
4 | FORENAME | 10 | X |
5 | ADDRESS | ¦67 | |
6 | NUMBER | 3 | X |
7 | ROAD | 10 | X |
8 | AGE | 2 | 9 |
9 | PAY | 7 | 9999V99 |
Although each field has a COBOL level number, the compiler numbers each field sequentially starting at one, as shown in the Field Number column. The name of the field as written in the COBOL source code is stored in the Field Name column. The Length column records the number of characters the field can contain. The type column records which of the three data types can be stored in the field:
- A – Alphabetic
- 9 – Numeric
- X – Alphanumeric (alphabetic and alphanumeric)
Fields that head a set of sub-fields have zero length and a type that begins with the ¦ character. This character precedes the number of the first sub-field followed by the number of the last sub-field. The sub-field numbers are not the COBOL record level numbers but the compiler’s internal sequential numbering shown in the Field Number column. In the table, field 1, MASTER, has type ¦29 indicating that MASTER is the beginning of a set of sub-fields starting with field 2 and ending with field 9. Similarly, NAME has the type ¦34 because it heads sub-fields 3 to 4 and ADDRESS has the type ¦67 because it heads fields 6 to 7.
Whenever a field is defined that has a higher level number than the previous field, the type of the previous field is set to ¦** to mark it as heading a set of sub-fields. When the number of the final sub-field is known, the ¦** is updated to record the numbers of the first and last sub-field.
Each of the six available files has a string that holds the contents of all the fields in a record. Each field is located within the string using a pointer. The picture definitions in the Data division provide the data type and length of each field and enables the compiler to calculate the position of the pointer to the start of each field in the string. For example, the following pointers would be produced for the fields represented in the above table:
Field Number | Pointer | Length | Type |
---|---|---|---|
1 (MASTER) | ¦29 | ||
2 (NAME) | ¦34 | ||
3 (SURNAME) | 10 | X | |
4 (FORENAME) | 10 | 10 | X |
5 (ADDRESS) | 20 | ¦67 | |
6 (NUMBER) | 20 | 3 | X |
7 (ROAD) | 23 | 10 | X |
8 (AGE) | 33 | 2 | 9 |
9 (PAY) | 35 | 6 | 9999V99 |
If the SURNAME, FORENAME, NUMBER, ROAD, AGE and PAY fields of the record held the values STEVENS, PETER, 21, HIGH ST., 37 and 127.34, respectively, the compiler would construct the following string to represent the record:
A B C D E F ↓ ↓ ↓ ↓ ↓ ↓ STEVENS,,,PETER,,,,,21,HIGH ST.,,37,127.34
A is the pointer to the start of fields 1, MASTER, 2, NAME, and 3, SURNAME. B is the pointer to the start of field 4, FORENAME. C is the pointer to the start of fields 5, ADDRESS, and 6, NUMBER. D is the pointer to the start of field 7, ROAD. E is the pointer to the start of field 8, AGE and F is the pointer to the start of field 9, PAY. The unused characters in each field are represented here by commas.
The string is 41 characters long because the FORENAME and SURNAME fields hold 10 alphanumeric characters each, the NUMBER field holds 3 alphanumeric characters, the ROAD field holds 10 alphanumeric characters, the AGE field holds 2 numeric characters and the PAY field holds 6 numeric characters.
4.4 Compiling COBOL into a Pseudo Code
The Procedure division is the only part of a COBOL program compiled using this implementation that produces pseudo code that is stored. The rest of a COBOL program is not redundant, however; the symbol table is constructed from the Data division, for example. For brevity, only the pseudo code produced when the Procedure division of a program is compiled will be described here; the pseudo code produced for the rest of the program follows the same pattern.
Each element of the pseudo code is described using the following key:
- -1:
- not a recognized word;
- a label not yet defined.
- 2 digit integer:
- a COBOL reserved word (listed below);
- a file number (1 to 5, where 5 is the Working Storage section);
- a label number (1 to 20);
- a literal number (1 to 20).
- 3 digit integer: a variable number calculated by the following formula: [(file number + 1) * 100] + field number. For example, variable number 206 means the sixth field of the first file.
This implementation of COBOL has the following reserved words:
Division headings:
- IDENTIFICATION
- ENVIRONMENT
- DATA
- PROCEDURE
Section headings:
- CONFIGURATION
- INPUT_OUTPUT
- FILE
- WORKING_STORAGE
Data operations:
- MOVE
- ADD
- SUBTRACT
- MULTIPLY
- DIVIDE
- DISPLAY
Data handling:
- OPEN
- CLOSE
- INPUT
- OUTPUT
- READ
- WRITE
- SELECT
General:
- PROGRAM_ID
- AUTHOR
- DATE_WRITTEN
- REMARKS
- SOURCE_COMPUTE
- OBJECT_COMPUTER
- FILE_CONTROL
- unused
- unused
- RUN
- STOP
- DIVISION
- SECTION
- DISK
- BBC_MICRO
- ASSIGN
- TO
- BY
- FROM
- GIVING
- PIC
- VALUE
- AT
- END
- FD
- GO
- IS
- LABEL
- RECORDS
- ARE
- OMITTED
- KEYBOARD
- IF
- EQUAL
- GREATER
- SMALLER
- THAN
Before generating the pseudo code, each line of COBOL is reduced to remove the reserved words that are not converted into pseudo code. For example, the following line of COBOL:
is reduced to:
The reserved words By and Giving are not converted into pseudo code and are deleted. The following pseudo code is then generated:
12 607 605 602
The first number, 12, is the number of the Multiply reserved word, as listed above. The numbers 607, 605 and 602, are the variable numbers of the fields HOURS1
, RATE1
and GROSS_PAY
, as calculated using the above formula.
The following program fragment—taken from the payroll test program C.PROCESS—shows each line of source code of a Procedure division and the corresponding pseudo code produced by the compiler. A plus sign (+) at the beginning of a line of pseudo code indicates that the pseudo code will be stored. Because the Procedure division statement is not an instruction that will be executed, the pseudo code for that line is not stored.
4.5 File Structures
Four types of file are used by the package:
- COBOL source code files;
- object pseudo code files;
- data files created and used by COBOL programs; and
- data files used by the compiler and run time interpreter.
COBOL Source Code Files
COBOL source code files are standard pure ASCII text files produced by a word processor or a text editor. They have a simple structure of one character followed by another and terminated by an end of file marker.
Object Pseudo Code Files
Pseudo code files are divided into two sections:
- the pseudo code itself
- the description of the six available files
arranged in the following format:
The <FILE n DETAILS>
item specifies the structure of the records in a file in the following format:
The <POINTER>
specifies the start of the field in the string representation of the field, the <TYPE>
specifies the type. For example, AAA, X(20) and S99V9, and the <LENGTH>
specifies the number of characters allowed in the field.
The <CONTENTS OF FILE n>
item is the initial value of the fields in the record, as specified by the VALUE IS
clause in the Data division.
The <FILE NAME>
item is the external filename of the file.
The <FILE TYPE>
item specifies whether the file is an input file (-1) or an output file (1).
Data Files Created by COBOL Programs
The structure of these files depends on the record defined in the Data division and therefore cannot be described here. The disk filing system handles all end of file markers, etc.
Compiler and Run Time Interpreter Data Files
The compiler stores the following information in a data file:
- the reserved words;
- the compilation error messages; and
- the statements that are compulsory in this implementation.
This file has the following structure:
The run time interpreter stores the run time error messages in a data file in the following format:
5. Testing and Evaluation
The package was tested with a payroll application, which is a typical business data processing application. The payroll application was implemented by four programs:
- C.CMASTER: a program to create the master file from data entered with the keyboard;
- C.CTRANS: a program to create the transaction file from data entered with the keyboard;
- C.PROCESS: a program to process the master and transaction files to produce a new master file;
- C.DNEW: a program to display the contents of the new master file to verify the expected results.
The payroll is an elementary application that works as follows. A master record and a transaction record are loaded into the computer and the new amount to be paid is calculated using the following algorithm:
gross_pay = hours * rate rate = rate * over_rate over_pay = over_time * rate gross_pay = gross_pay + over_pay gross_pay = gross_pay + bonus pay = pay + gross_pay
The master record is updated with the new amount to be paid and the payroll processing program writes the updated record to a new file called C.NEW that has the same structure as the master file.
The payroll assumes there is a transaction record for each master record and that the transaction records are sorted in the same order as the master records.
The four payroll application programs used to test the package are listed below.
C.MASTER
C.MASTER creates the master file from data entered at the keyboard.
C.TRANS
C.TRANS creates the transaction file from data entered at the keyboard.
C.PROCESS
C.PROCESS processes the master and transaction files to produce a new master file.
C.DNEW
C.DNEW displays the contents of the new master file to verify that C.PROCESS produces the correct results.
6. Conclusions and Further Work
The package has been tested extensively: the test programs produce the expected test results so the package has passed these tests. Although the package is error free when compiling and running the test programs, there may still be errors in the package.
The limited 32K memory is the greatest obstacle to enhancing the package and few improvements can be made without more memory. One enhancement that could be made that does not require more memory is the addition of an indexed sequential filing system, a more natural filing system for COBOL. Using a different disk filing system—such as Acorn’s hierarchical Advanced Disk Filing System (ADFS)—would enable more files to be processed simultaneously.
Expanding the computer by adding extra memory, or by upgrading to a BBC Master or BBC Master Compact, would enable several enhancements, including:
- the introduction of arrays and algebraic expressions;
- the availability of more fields in each data record;
- the inclusion of numbers and literals in a statement, such as
ADD 1 TO COUNT GIVING COUNT.
andIF CHAR GREATER THAN "*" GO TO LABEL
; and - the detection and neat trapping of more compilation and run time errors.
With enough memory, however, a full implementation of COBOL could be produced but that would be beyond the scope of the project.