Wednesday, December 15, 2010

FILE HANDLING

4.1 Introduction.
A file is a collection of data related to a set of entities and typically exists on a magnetic tape or a disk. The data contained in a file logically organised into an ordered set of data items and is known as a record. The individual data item in a record are its field. For e.g., a file may contain data related to the employees of HTMT. The data pertaining to any individual corresponds to a record. The fields may be employee number, employee name, employee?s mail-id, employee?s age etc. The number of characters in any field is known as the field size and cumulative size of all fields in a record is known as the record size. There exists some restriction on the record size. For e.g. the lower and the upper limit of record size may be 2 bytes and 4096 bytes respectively.
The records present in a file may be of fixed length or variable length. In most applications, fixed length records are used. However in certain cases it is essential that records be of different lengths. This happens when some of the fields are irrelevant to a set of records. In case of variable length records, usually the first four bytes are reserved for record length.
A record as defined above is often referred to as a logical record. While processing large files stored on a disk or tapes, it is inefficient to read or write single record at a time. Instead the usual practice is to group a number of consecutive records to form what is known as a physical record or a block. For instance a block may contain 4 logical records. This means the first 4 records will form the first block, the next 4 the second block and so on. The number of records in a block is termed as blocking factor. There are two advantages of blocking logical records into a physical record. Firstly, it results in saving I/O time required for processing a file (how?) and secondly results in saving the storage space for a file (how?).
MVS uses the programmer?s description of the record to set aside sufficient memory to hold one record at a time. The memory allocated for storing a record is termed as record buffer. It is the only link between the program and the physical file. The record buffers are created and destroyed when the corresponding files are opened and closed in the programs using them. MVS facilitates overlapping of I/O operations with CPU operations. To take advantage of this feature, we may specify the number of buffers that can be used by the program. From experience, it is found that specifying two buffers is very effective. In this case while the program reads into one buffer, the CPU processes the record already read into the other buffer.
4.2 File Operations.
There are three basic types of operations namely.
1.                  Create: Refers to producing a brand new file and writing one or more logical record
into it.
2.                  Retrieve: Refers to reading the logical record from the file.
3.                  Update: Refers to maintenance of records in a file to ensure that it is up to date. There are three types of updating operations. They are Record deletion, Record insertion and Record modification.

4.3 File ORGANIZATION and ACCESS MODE
The term file ORGANIZATION refers to the way in which the logical records are organized with in the file. The ACCESS MODE refers the way in which the records in the file will be accessed (sequential or random). ANSI COBOL provides three standard file ORGANIZATIONs viz. Sequential, Indexed and Relative. Corresponding to each of these three ORGANIZATIONs, there exist one or more ACCESS MODEs.
4.3.1 SEQUENTIAL ORGANIZATION.
There are two categories in Sequential ORGANIZATION viz. Entry sequential and Line sequential. In case of Entry  Sequential ORGANIZATION, the records are stored in the file in the same order in which they are entered. Here, the records can be accessed only sequentially. i.e. to process any record, one has to read all its preceding records. Further, records cannot be inserted or deleted. Records can only be added to the end of the file. However, a record can be overwritten if the lengths of the old record and the new record matches.
In case of Line sequential ORGANIZATION, each record contains a sequence of characters ending with a record terminator. The terminator is not counted in the length of the record. When records are written to a line sequential file, the trailing blanks (if any)  are removed and while reading the record from a line sequential file characters are read one at a time into the record buffer until the first record terminator is encountered. It is important to note that Line sequential ORGANIZATION is available only on workstations.
Though sequential files are the most storage efficient and the simplest to handle, they are highly inflexible, as they do not facilitate insertion and deletion of records. For this reason, sequential files are not normally used for permanent storage but rather as search files (Files that will be used once or twice and then destroyed).
4.3.2 INDEX SEQUENTIAL ORGANIZATION.
The index sequential ORGANIZATION is the most sophisticated and widely used among the three file ORGANIZATIONs. It facilitates both sequential and random access of records. An index sequential file is conceptually made up of two files, a data file and an index file. Each record has a key value associated with it. Though the records are stored in the order in which they are entered, a sorted indexed is maintained which relates the key value to the position of the record in the file and hence provides a way to access the records both sequentially and randomly.There are several methods that are used for storing the index. One method is the sparse index. In this method data within a particular range of key values is stored together. In the latest indexed file implementations, a dense index is maintained. Here the key value of each record is stored in the index string along with the address in the file where the data is to be found. This is maintained as a B-tree to aid searching and changes.
In Index Sequential ORGANIZATION, we can use alternate indexes. For e.g., suppose a file contains a data pertaining to the employees of HTMT, it may be necessary to access data by employee number as well as mail-id. Hence mail-id can be used as an alternate index. It is important to note that the number of indexes improve performance on  reads but affects performance on writes. This is because all indexes have to be updates on each write operation.
4.3 RELATIVE ORGANIZATION
In relative ORGANIZATION, a file is thought as a string of record areas, each of which contains a single record. Each record area is identified by a relative record number, the access method stores and retrieves a records, based on its relative record number 2 and so on. The physical sequence in which the records were placed in the file has no bearing on the record?s relative record number. Similar to Index Sequential ORGANIZATION here also, the records can be accessed both sequentially and randomly.
A relative file provides the fastest access to records but has some disadvantages. Even if some of the intermediate records are missing, they occupy space. Hence it is suitable for data which can be converted to some unique record number through some transformation.
The transformation must result in high degree of packing for the file to be completely filled. Also, problems arise if the transformation is not unique. Another disadvantage of this kind of organisation is that for some hashing function to be effective, some idea of the average number of records that will be present in the file required. This information may not be available.

4.4 Making entries for a file in a program

There are two types of entries that are required in a program for any file. They are file description entries and record description entries. The file description entries specify the physical aspects of the data such as the size relationship between physical and logical records, the size and name(s) of the logical records(s), labeling information, etc., these entries are made in ENVIRONMENT DIVISION. The record description entries describe the logical records in the file, including the category and format of data within each field of the logical record, different values the data might be assigned, etc., These entries are made in the DATA DIVISION.

4.4.1 File description entries for a sequential file

The file description entries are made in the FILE-CONTROL paragraph of the ENVIRONMENT DIVISION using the SLECT? ASSIGN?. Clause. Its format is given below.
SELECT logical-file-name ASSIGN TO physical-file-name
[; RESERVE integer{AREA, AREAS}]
[; ORGANIZATION IS SEQUENTIAL]
[; ACCESS MODE IS SEQUENTIAL]
[; FILE STATUS IS data-name]

Corresponding to every file there must be a SELECT ? ASSIGN ? clause. The purpose of this is to establish a relationship between the logical file name(internal to COBOL) used in the program and the physical file name( external file name) used to store the file on DASD. After the relationship between physical and logical records has been established, only logical records are made available to the programmer. For the reason, when we say record, we mean the logical records and not the physical record.
 
4.4.1.1 RESERVE Clause

The RESERVE clause allows the user to specify the umber of record buffers to be allocated at run-time for the files. Thus reserve2 areas mean that two record buffers are to be allocated. If the reserve clause is omitted, the number of buffers at run time is taken from the DD statement when running under MVS. If none is specified, the system default is taken.

4.4.1.2 Organization clause

The organization clause identifies the logical structure of the file. The logical structure is established at the time the file is created and cannot subsequently be changed. Even if you omit the organization clause, the compiler assumes organization is sequential.

4.4.1.3 Access mode clause

The access mode clause defines the manner in which the records of the file are made available for processing. If the access mode clause is not specified, sequential access is assumed the file status clause monitors the execution of each input-output operation for each and every file.

4.4.1.4 File Status clause

When the file status clause is specified, the system moves a value into the two byte alphanumeric data-name defined in the working-storage section after each input-output operation that explicitly or implicitly refers to this file. The value indicates the status of execution of the statement. It is a very good practice to code the file status clause for every file.

4.4.2 Record description entries for a sequential file

The record description entries are made in the file section of the DATA DIVISION under the FD paragraph.
Format 1: (fixed length records)
FD filename
[; RECORD CONTAINS integer-1 CHARACTER]
[; BLOCK CONTAINS integer-2 {RECORDS,CHARACTERS}]
[; DATA {RECORD IS, RECORDS ARE} data-name-1 [,data-name-2]?]

Format 2: (variable length records)
FD filename
[; RECORD CONTAINS integer-1 to integer-2 CHARACTER]
[; BLOCK CONTAINS integer-3 to integr-4 {RECORDS,CHARACTERS}]
[; DATA {RECORD IS, RECORDS ARE} data-name-1 [,data-name-2]?]

4.4.2.1 record contains clause

The record contains clause specifies the size of the logical records. This clause cannot be used for LINE SEQUENTIAL files.

4.4.2.2 BLOCK CONTAINS clause

The block contains clause specifies the size of the physical records. If the records in the file are not blocked, the block contains clause can be omitted. When it is omitted, the compiler assumes that records are not blocked. Even if each physical record contains only one complete logical record, coding BLOCK CONTAINS 1 RECORD would result in fixed blocked records. The block contains clause can be omitted when the associated file control entry specifies a VSAM file, since the concept of locking has no meaning of VSAM files.
4.4.2.3 Data record clause

The data record clause specifies the record names defined for the file. It is used only to provide better documentation.

4.5 PROCEDURE DIVISION statements for sequential files

4.5.1 OPEN statement

The open statement initiates the processing of files. The successful execution of an open statement determines the availability of the file for processing. A file is available if it is physically present and is recognies by the input-output control system (IOCS is a sub system of MVS that supports file processing). The succssful execution of the OPEN statement makes the associated record area available to the program; it does not obtain or release the first data record. If the FILE STATUS clause is specified in the FILE-CONTROL entry, the associated status key is updated when the OPEN statement is executed.

Format:
OPEN {INPUT,OUTPUT,EXTEND,I-O} file-name[,file-name-2]?
[{INPUT,OUTPUT,EXTEND,I-O}file-name-3 [, file-name-4]?]?

A sequential file can be opened in one of the following four modes. INPUT,OUTPUT,EXTEND and I-O.

  • A file can be opened in the INPUT mod only if it is already existng. Such a file becomes an input file from which records can be read sequentially.
  • When a file is to be created for the first time, it must be opened in the OUTPUT mode. Note that, opening an existing file in the OUTPUT will result in the loss of  all the data.
  • The EXTEND mode also opens a file for writing, but the file pointer is positioned after the end of the laste record. Thuus any records written will get appended to the file.
  • A file is opened in the I-O mode when it needs to be updated. This mode provides both reading and rewriting of records.

4.5.2 CLOSE statement

The CLOSE statement terminates the processing of the file. As a result of the execution of the CLOSE statement, the IOCS performs the end of file processing. The record buffer created for the corresponding file gets dsestroyed and thus the link between the program and the file is lost. The CLOSE statement is optional from COBOL-85. The STOP RUN statement automatically closes all the files that were opend by the program. The CLOSE statement can be used with the LOCK option, which prevents the file to be opened again in the same progra,m.

Format:            CLOSE file-name-1 [WITH LOCK] [, file-name-2 [WITH LOCK]] ?

Note:   though it is possible to OPEN and CLOSE  more than one file at a time, programmers are advised not to do so. This is because if opening or closing of any particular file is unsuccessful, then it is impossible to identify that file using the FILE STATUS  as it applies to all the files.

4.5.3 READ statmentt

If a file is opened in the INPUT or the I-O mode, then we can use the READ statement to make the next logical record from a file available to the object program. Though the primary function of the READ statement is to fetch records from a file and place the file pointer at an appropriate position after READ, it performs certain checks to ensure proper execution of the program. It checks the length of the input record to ensure that is corresponds to the length specified in the RECORD CONTAINS clause. It also uses the BLOCK CONTAINS clause, if specified, to perform a check on the blocking factor. Te READ statement can be sued with INTO option for gettig acopy of the ological record into a WORKING-STORAGE variable. The READ statement can be also used with an AT END and NOT AT END clause. The AT END determines whether there is any more input and the programmer can decide what to do based on the anser. The NOT AT END can be used to accomplish specific tasks when an AT END has been reached.

Format:

READ file-name INTO data-name
[ AT END imperative statements]
[NOT AT END imperative statements]
[END-READ]

4.5.4 WRITE statement

If a file is opened in the OUTPUT or the EXTEND mode, then we can use the WRITE statement to transmit data to the physical file. Once a record has been written to a file, it is no longer available in the record buffer. It is important to note that although we read files, we write records. The WRITE statement can be used with FROM option for writing data directly from a WORKING-STORAGE variable to the required file, otherwose the data must be moved to the record buffer and then written to the file. Further the WRITE statement can be used with ADVANCING option to  write records on a fresh line/page.
 
Format:
WRITE record-name[FROM data-name]
{AFTER,BEFORE} ADVANCING integer {LINE,LINES, PAGE}
[END-WRITE]

4.5.5 REWRITE statement

If a file is opened in the I-O mode and a record has been read successfully into the record buffer, then we can use the REWRITE statement to update an existing record. Similar to the WRITE statement, the REWRITE statement can be used with FROM option for writing data directly from a WORING-STORAGE variable to the required file,

Format:

REWRITE record-name[FROM data-name]
{AFTER, BEFORE} ADVANCING integer {LINE, LINES, PAGE}
[END-REWRITE]

Program to create a sequential file.
IDENTIFICATION DIVISION.
PROGRAM-ID. SEQFCRE.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL
SELECT  STUDFILE ASSIGN TO STUDDD.
DATA DIVISION.
FILE SECTION.
FD STUDFILE.
01 FS-STUD-REC.
05 FS-REGNO PIC X(5).
05 FS-NAME PIC A(15).
05 FS-AGE PIC 99.
05 FILLER PIC X(58).
WORKING-STORAGE DIVISION.
01                            CHOICE  PIC  X(1) VALUE ?Y?.
01 WS-STUD-REC.
05 WS-REGNO PIC X(5).
05 WS-NAME PIC A(15).
05 WS-AGE PIC 99.
05 FILLER PIC X(58).
PROCRDURE DIVISION.
G0000-MAIN-PARA .
OPEN OUTPUT STUDFILE.
PERFORM G1000-CREATE-PARA UNTIL CH = ?N?.
CLOSE STUDFILE.
STOP RUN.

G1000-CREATE-PARA.
DISPLAY ?ENTER THE STUDENT REGISTER NUMBER : ?.
ACCEPT WS-REGNO.
DISPLAY ?ENTER THE STUDENT NAME : ?.
ACCEPT WS-NAME.
DISPLAY ?ENTER THE STUDENT AGE : ?.
ACCEPT WS-AGE.
WRITE FS-STUD-REC  FROM WS-STUD-REC.
DISPLAY ?DO YOU WISH TO CONTINUE [Y/N] : ?.
ACCEPT CH.

Program that creates an Indexed Sequential File
IDENTIFICATION DIVISION.
PROGRAM-ID.  IND-SEQ.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT PAYFILE ASSIGN TO "PAY.DAT"
ORGANIZATION IS INDEXED
ACCESS IS SEQUENTIAL
RECORD KEY IS EMP-NO.
DATA DIVISION.
FILE SECTION.
FD  PAYFILE
LABEL RECORDS ARE STANDARD.
01  PAYREC.
05  EMP-NO                PIC 9(4).
05  EMP-NAME          PIC A(15).
05  SALARY               PIC 9(5).
WORKING-STORAGE SECTION.
01  REPLY                      PIC X   VALUE  "Y".
PROCEDURE DIVISION.
PERFORM INIT.
PERFORM CREATE-ROUT UNTIL REPLY = "N".
PERFORM TERMIN.
GOBACK.
INIT.
OPEN OUTPUT PAYFILE.
CREATE-ROUT.
DISPLAY "ENTER EMP-NO".
ACCEPT EMP-NO.
DISPLAY "ENTER EMP-NAME".
ACCEPT EMP-NAME.
DISPLAY "ENTER EMP-SAL".
ACCEPT SALARY.
WRITE PAYREC.
DISPLAY "WISH TO CONTINUE  Y\N ".
ACCEPT REPLY.
TERMIN.
CLOSE PAYFILE.

No comments:

Post a Comment