Secrets of OpenVMS® File I/O

Secrets of OpenVMS® File I/O


Abstract

This copyrighted document may be reproduced without formal consent from Touch Technologies, Inc., provided that each copy includes this document in its entirety.

Touch Technologies, Inc.
9988 Hibert Street, Suite 310
San Diego, California 92131
(800) 525-2527


NOTICE

Touch Technologies, Inc. (TTI) has prepared this publication for use by TTI personnel, licensees, and customers. This information is protected by copyright. No part of this document may be photocopied, reproduced or translated to another language without prior written consent of Touch Technologies Incorporated.

TTI believes the information described in this publication is accurate and reliable; much care has been taken in its preparation. However, no responsibility, financial or otherwise, is accepted for any consequences arising out of the use of this material.

The information contained herein is subject to change without notice and should not be construed as a commitment by Touch Technologies, Inc.

The following are trademarks of Touch Technologies, Inc., and may be used only to describe products of Touch Technologies, Inc.:

 
DYNAMIC TAPE ACCELERATOR        INTOUCH        CleanDisk  
  
DYNAMIC LOAD BALANCER PLUS        REMOTE DEVICE FACILITY  

The following are trademarks of Digital Equipment Corporation, and may be used only to describe products of Digital Equipment Corporation:

 
DBMS      DCL      DECNET      RDB      RMS      VAX      OpenVMS  

Revised 21-Oct-1995


Contents


Note

® VMS is a registered trademark of Digital Equipment Corporation.



Preface

Over the last 20 years, OpenVMS processors have sped up by a factor of 30. Over the same period of time, disk I/O systems have only sped up by a factor of two. This imbalance has caused most OpenVMS systems to switch from being CPU bound to being I/O bound. In addition, most clustered OpenVMS systems suffer from excessive "lock traffic" caused by the increased I/O demands.

Purpose

This technical report addresses both I/O bottleneck problems and solutions.

Topics to be covered include:

Audience

Readers are expected to have used the following VMS utilities:


Chapter 1
MEASURING HOW I/O BOUND YOU ARE

How do you know when you are I/O bound? Well, the easiest way to find out is to use the DEC provided MONITOR DISK utility as follows:

 
  
  $ MONITOR DISK/ITEM=QUEUE  
  

This DCL command displays the I/O queue depth for each disk device. A depth of two means that on average two users are waiting for the disk at all times. A depth of six would mean that six users are waiting for the disk---disk I/O response time is three times slower than accessing a disk with an I/O queue depth of two.

Example 1-1 MONITOR DISK Utility

 
  
  
                  OpenVMS Monitor Utility  
                    DISK I/O STATISTICS  
                       on node TTI  
                    25-JAN-1995 13:11:35  
  
   I/O Request Queue Length    CUR        AVE        MIN        MAX  
                              ----       ----       ----       ----  
   $1$DIA0:     TTIVMSRL5     1.20       2.20       0.10       2.50  
   $1$DUB3:     USER          3.65       6.15       0.65       8.65  
  
  
  

1.1 What Can You Do about an I/O Bound System?

Once you have determined that you have an I/O bottleneck (average disk I/O queue depth greater than .5), steps should be taken to eliminate the bottleneck. The steps are:

1.2 Determining Which Files are Hot

Hot files are those with high I/O counts. On most systems, 95% of the I/Os are caused by less than 5% of the files. And a full one-third of those files are VMS internal files (pagefiles, installed images, JBSYSQUE.DAT, ...).

The easiest way to locate your hot files is to use a software package that locates them for you. The Digital Equipment Corp. VPA utility can provide a list of hot files. Third party software such as Dynamic Load Balancer Plus also provides hot file reports.

Example 1-2 Locating Hot Files

 
  
  
December 1, 1995              DLB Plus                        Page   1  
IOPLUS              Hot File Analysis (High I/O)  
      Node MINI sampled on December 1, 1995 at 06:14 PM by SMITH  
                    Sorted By I/O Rate  
  
                    Avg                      File   Rec   Bkt    File  
File Name          Rate  I/Os  Read%  Frags  Org   Size  Size    Size  
----------------- ----- -----  ------ -----  ---- -----  ----  ------  
SALES_MSTR.DAT;1    5.61 206   100.00   660   IND  1508     8   20007  
MENU.INT_IMG;52     1.21 121   100.00    
PAYROLL_RUN.EXE;4   1.03  60   100.00  
DECW$SERVER.EXE;1    .99  51   100.00  
DECW$SRV_DX.EXE;1    .65  24   100.00  
ACCOUNTNG.DAT;1      .50  10    10.00  
PAYROLL.EXE;400      .49  62   100.00  
     .  
     .  
     .  
                        =====  =======  
   Grand Totals:        14412    98.60    
  

1.2.1 Other Methods of Determining Hot Files

If you don't have access to a software product that will determine your hot files for you, you will need to figure it out manually. Digital provides a DCL command that can assist you. This command is an undocumented/unsupported DCL command.

 
  
  $ SET WATCH/CLASS=MAJOR FILE  
  $ SET WATCH/CLASS=none  FILE    ! "none" turns off the feature  
  

SET WATCH can help you find:

The command evokes an image called

SETWATCH.EXE

This command has existed since at least VMS Version 4.0, through 6.2.

SET WATCH needs to be installed with CMEXEC privilege.

SET WATCH applies only to the process that evokes the command. The command is:

 
  
        SET WATCH/CLASS=MAJOR FILE  
  

[MAJOR] says, "Tell me all the [MAJOR]operations that are happening with the file system."

The key word is [FILE]. This is not a file spec; it is actually the word [FILE].

 
  
        SET WATCH/CLASS=ALL FILE  
  

ALL tells you more than you ever wanted to know about what happens when files get opened and what happens when file accesses occur.

SET WATCH also shows you the number of file accesses and deaccesses.

1.2.2 File Access and Deaccess

Opening a file is called an ACCESS (accessing a file). When you access a file, SET WATCH shows you:

Closing a file is called a DEACCESS. A deaccess tells you, on a per file basis:

Access and deaccess analysis information is always built into VMS; it is always available. And this information is given to you by this undocumented, unsupported command!

1.2.3 Installing SETWATCH.EXE with Privileges

The SYS$SYSTEM:SETWATCH.EXE image must be installed with CMEXEC privileges. This is accomplished by:

 
  
        $ INSTALL:==$INSTALL/COMMAND  
        $ INSTALL ADD SYS$SYSTEM:SETWATCH/PRIV=CMEXEC  
  

The SET WATCH command prints (to SYS$OUTPUT) the name of each file that is opened by an image. And when the file is closed, it prints the number of PHYSICAL reads and writes that occurred.

For example, to monitor the physical I/Os used when we execute VMS mail:

 
  
     $ SET WATCH/CLASS=MAJOR FILE  
     $ MAIL  
       .  
       .  
       .  
      Access SYSUAF.DAT;2 (28,4,0)  
      Access VMSMAIL_PROFILE.DATA;1 (84,12,0)  
      You have 2 new messages.  
  
     MAIL> SEND/NOEDIT  
     To: SYSTEM  
     CC:  
     Subj: new system  
     Enter your message below. Press CTRL/Z when complete, or CTRL/C to  
     quit:  
  
     Create ....... (9881,58,0)  
     How are plans for buying the new 9000 going??  Sure be nice to get rid  
     of this 11/750!!  
     <CTRL/Z>  
     Deaccess (9881,58,0) Reads: 0, Writes: 2  
  
     MAIL> EXIT  
     Deaccess (28,4,0) Reads: 8, Writes: 0  
     Deaccess (84,12,0) Reads: 7, Writes: 2  
     $ SET WATCH/CLASS=NONE FILE  
  

From this we can see that SYSUAF.DAT (file ID=28,4,0) had eight reads and no writes, while VMSMAIL_PROFILE.DATA (file ID=84,12,0) had seven reads and two writes. By using the SET WATCH command and then running a typical I/O intensive application, you can determine which files seem to have the highest I/O counts. Multiplying this by the number of users on your system can give you a reasonably accurate list of hot files.


Chapter 2
REDUCING FILE I/O BOTTLENECKS

2.1 What Can You do about an I/O Bound System?

Once you have determined that you have an I/O bottleneck (average disk I/O queue depth greater than .5), steps should be taken to eliminate the bottleneck. The steps are:

By analyzing file I/O operations, hot files (those with high I/O counts) are identified. Hot files consume valuable I/O resources. Once identified, hot files can be moved to your fastest disk devices. Files with high read/write ratios are excellent candidates for local and global data buffering, or can be moved to a RAM disk.

Two major actions can be taken to reduce the I/O bottlenecks caused by files with high I/O counts:

2.2 Speeding Up I/O Operations

Speeding up a file's I/O operations can be accomplished by moving the file to a faster or less busy device or by moving the file across multiple spindles (as in a shadow set). Both read and write operations can be sped up using this method.

2.3 Eliminating I/O Operations

Eliminating file I/O operations can be accomplished in a number of ways. Some of these ways include:

Table 2-1 Eliminating File I/O Operations
Method Result
host based data caching speeds up file reads
RMS global buffering speeds up file reads
RMS file converts speeds up both reads and writes
RMS local buffering speeds up both reads and writes
disk defragmentation speeds up both reads and writes
file defragmentation speeds up both reads and writes


Both RMS local buffering and global buffering can be requested for a given file.

2.4 Host Based Data Caching

Host based data caching uses free memory for high-speed data caching. I/O requests to the file are intercepted by the caching system. If the I/O request is a write operation, the data is passed to the disk device. No speed up occurs. If a read I/O request is intercepted and the requested data is already in the memory data cache, the request is satisfied with a very fast memory move. No actual I/O to the disk occurs. Host based data caching systems are available from a number of commercial software vendors.

2.5 RMS File CONVERSION

As RMS based files are written to, they become internally fragmented and disorganized. Over time, both read and write operations cause extra physical I/O operations to the RMS file. The Digital provided CONVERT utility can be used to defragment and reorganize RMS files. To convert the file MYFILE.DAT, at the DCL prompt enter:

 
  
        $ CONVERT myfile.dat  myfile.new  
  
        $ RENAME  myfile.new  myfile.dat;  (note the trailing ";")  
  

This two-step process safely converts and reorganizes an RMS file.


If the CONVERT fails, DO NOT DO THE RENAME. THIS INSURES THE INTEGRITY OF YOUR ORIGINAL UNCONVERTED FILE.

2.6 RMS Buffering

RMS moves data from the disk into memory buffers. From the buffers, data is moved into the application program. Whenever the requested data can not be found in a data buffer, RMS must access the disk to find the data. Accessing the disk is much slower than getting information from a data buffer.

RMS provides two types of file data buffers. These are:

Local data buffers are not shared among processes. Local buffers can only be accessed by the process that they were created for. When RMS opens an indexed file, by default it creates two local data buffers.

Global data buffers are shared among processes. Global buffers can be accessed by all processes that have the file open. By default RMS does not create any global data buffers.

File I/Os can be reduced using either or both of these buffering methods. However, increased buffering requires additional system resources. To avoid running out of system resources, both SYSGEN and AUTHORIZATION (SYSUAF) parameter changes are needed.

2.6.1 RMS Local Buffering

RMS indexed files with high file I/O counts can benefit from increased local buffering. As the number of local buffers is increased, more I/O requests can be satisfied from the local buffer cache. In some cases, even write requests can be sped up using local buffering (for deferred write operations).

The number of local buffers used by RMS indexed files can be set on either a per-process basis or system wide. In either case, the Digital provided SET RMS command is used to specify the number of local buffers.

For example, to set the number of local buffers used for indexed files for ALL users on the system to eight, the following DCL command is used:

 
  
  $ SET RMS/SYSTEM/INDEX/BUFFER=8  
  

To set the number of local buffers used for indexed files for JUST THIS PROCESS to ten, the following DCL command is used:

 
  
  $ SET RMS/INDEX/BUFFER=10  
  

The SET RMS command takes effect the next time a file is open.

2.6.2 RMS Global Buffering

RMS based hot files with high read I/O percentages (75% or greater) can benefit from increased global buffering. As the number of global buffers is increased, more read I/O requests can be satisfied from the global buffer cache. Write requests are written directly to the disk and are not sped up by global buffering.

To specify the number of global buffers to be used on a file, the file must be closed. To set the number of global buffers on file MYFILE.DAT to thirty, the following DCL command is used:

 
  
  $ SET FILE myfile.dat/GLOBAL=30  
  

After the global buffers are set up on a cluster, the global buffers are created the first time the file is accessed cluster-wide.

2.6.3 An RMS Global Buffering Example

Global buffering uses address space and may use physical pages off the free list. For example, you have a file that has 2 buckets and you set up 30 global buffers. When the file is first accessed, 60 pages of address space is allocated (2 buckets x 30 global buffers) to the user's process. The number of physical pages allocated, for the first accessor to the file, can be from 0 to 60 pages depending on what the user is doing.

The second accessor to the file would not use any additional physical pages because global pages are shared among processes. The second accessor would, however, have 60 pages of address space allocated to a process.

So, for each user accessing the file, an additional 60 pages of address space is allocated. However, no additional physical memory pages are used---those are shared.

2.6.4 Monitoring RMS Cache Hits

VMS version 5.0 and higher provides a utility for monitoring RMS buffer caching activity.

2.6.5 Statistics Option

To perform RMS monitoring, the file to be monitored must first have the statistics option set. The statistics option takes up a small amount of space in the file header. However, there is no overhead in collecting statistics because VMS previously always collected this data, only never allowed the user to see the data.

In order to SET the statistics option on a file, the file must be closed. To set statistics on the file MYFILE.DAT, the following DCL command is used:

 
  
  $ SET FILE myfile.dat/STATISTICS  
  

After the statistics option has been set on the file, the following MONITOR command is used:

 
  
  $ MONITOR RMS/FILE=myfile.dat/ITEM=CAC  
  

The Digital provided MONITOR RMS utility provides both LOCAL and GLOBAL buffer caching information. The higher the cache hit percent shown in the display, the better the I/O performance of the file. Example 2-1 MONITOR RMS Utility

 
  
                      OpenVMS Monitor Utility  
                        RMS CACHE STATISTICS  
                           on node TTI  
                         1-DEC-1992 21:52:11  
(Index)  SALES_MASTER.DAT;1  
Active Streams:   2           CUR        AVE        MIN        MAX  
  
  Local Cache Hit Percent    37.00      36.65       0.00      40.00  
  Local Cache Attempt Rate   51.16       5.53       0.00      51.16  
  Global Cache Hit Percent   57.00      57.02       0.00     100.00  
  Global Cache Attempt Rate  31.89       3.50       0.00      31.89  
  Global Buf Read I/O Rate   13.95       1.48       0.00      13.95  
  Global Buf Write I/O Rate   0.00       0.00       0.00       0.00  
  Local Buf Read I/O Rate     0.00       0.02       0.00       0.33  
  Local Buf Write I/O Rate    0.00       0.00       0.00       0.00  
  

If only Global buffers are set, the Local Cache Hit Percent will be zero because VMS looks in the Local Buffers before looking in the Global buffers. If the requested data is not in the Local Buffers, the Global buffers are searched for the data. If the data is not in the Global buffers, VMS gets the data from disk. VMS then puts the data in a Global buffer since Global buffers were the last place VMS checked for the data.

2.6.6 SYSGEN Parameter Changes

RMS global buffering requires increased use of VMS global pages and global sections. In addition, some RMS related SYSGEN parameters must be changed. The following MINIMUM SYSGEN parameter values are recommended when global buffering is specified:

Table 2-2 SYSGEN Parameters Changes for RMS Global Buffering
Sysgen Parameter Name Minimum Value
GBLPAGFIL 16384
RMS_GBLBUFQUO 16384
GBLPAGES 50000
GBLSECTIONS 800

Both RMS local buffering and global buffering require increased use of VMS locking, address space and synchronization resources. The following MINIMUM SYSGEN parameter values are recommended when either local buffering or global buffering is specified:

Table 2-3 SYSGEN Parameters Changes for RMS Global or Local Buffering
Sysgen Parameter Name Minimum Value
IRPCOUNT 500
LOCKIDTBL 4000
LOCKIDTBL_MAX 16000
PQL_MENQLM 600
RESHASHTBL 2500
SRPCOUNT 4500
VIRTUALPAGECNT 35000
PQL_MPGFLQUO 35000
PQL_MBYTLM 35000

To view the number of global sections and global pages used you can enter:

 
  
        $ INSTALL:==$INSTALL/COMMAND  
        $ INSTALL LIST/GLOBAL/SUMMARY  
  
        Summary of Local Memory Global Sections  
  
    272 Global Sections Used,  21964/13036 Global Pages Used/Unused  
  

The SYSGEN parameter GBLSECTIONS is the total number of global sections.

2.7 Disk Defragmentation

Disk defragmentation is the process that causes files to become physically contiguous. Contiguous files can be accessed with fewer I/O operations than non-contiguous files. The two ways to defragment a disk are to do a full BACKUP and RESTORE to the target disk or to use a commercially available disk defragmentation product.

2.8 File Defragmentation

If you don't have the time to defragment all of your disks, you can instead defragment your most badly fragmented HOT files one at a time.

VMS provides a way to defragment individual files. There are three steps to the defragmentation process:

2.8.1 Create a .FDL for the file

A .FDL is a file definition language file. This file can be used with the Digital provided VMS CONVERT utility to defragment a file. To create a .FDL for the file MYFILE.DAT you would use the following DCL command:

 
  
  $ ANALYZE/RMS/FDL MYFILE.DAT  
  

The ANALYZE command creates a file called MYFILE.FDL. The .FDL is a text file containing a description of MYFILE.DAT.

2.8.2 Customize the .FDL file

Using the text editor of your choice, edit the .FDL file and insert the text best_try_contiguous yes as shown:

 
  
------------------------------------------------------------------  
  FILE  
          best_try_contiguous     yes      <--- the inserted text  
          ALLOCATION              nnn  
          ORGANIZATION            xxx  
            .  
            .  
            .  
------------------------------------------------------------------  
  

2.8.3 Converting and Renaming

The Digital provided CONVERT utility can be used to defragment and reorganize your files using a .FDL. Any time you change an .FDL you need to do a convert. To convert and defragment the file MYFILE.DAT, at the DCL prompt enter:

 
  
        $ CONVERT/FDL=myfile.fdl myfile.dat  myfile.new  
  
        $ RENAME  myfile.new  myfile.dat;  (note the trailing ";")  
  


If the CONVERT fails, DO NOT DO THE RENAME. THIS INSURES THE INTEGRITY OF YOUR ORIGINAL UNCONVERTED FILE.

Be sure to ALWAYS use the /FDL qualifier when doing a CONVERT. If the /FDL qualifier is not used the CONVERT will eliminate the best_try_contiguous = yes from the .FDL


Next page... | Contents