Library /sys$common/syshlp/PCA$HELP.HLB

    With DIGITAL PCA, you can examine how you have split the
    processing of the application between scalar and vector
    processors. You can also analyze how well your application's
    algorithms use the vector processor. Certain programs can run
    significantly faster on computers containing scalar and vector
    processors than on those containing scalar processors alone.
    Programs that use repetitive array and matrix operations can
    run faster on a vector processor because they are the most
    constrained by scalar performance bottlenecks. Programs that
    spend most of their time performing I/O operations, system
    services, or using data types not supported by vector hardware
    (for example, BYTE and LOGICAL) do not benefit as much by being
    executed on a computer with both scalar and vector processors.

1 – Finding Vector Processor Usage

    The Collector provides two data kinds for sampling vector-
    processing information: vector PC sampling and vector CPU
    sampling. You use the SET command, as shown in the following
    example, to enable sampling of PC values for random vector
    instructions:

  PCAC> SET VPC_SAMPLING

    The preceeding command enables the sampling of vector PC values
    and shows you where the wall-clock time is being spent in the
    application performing vector instructions. The sampling rate
    defaults to an interval of 10 milliseconds and includes all
    the idle process time associated with running the program. Call
    stack information is collected by default. The following command
    enables the sampling of vector PC values and lets you examine the
    particular areas of your application where process time is spent
    performing vector instructions.

  PCAC> SET VCPU_SAMPLING

    The sampling rate defaults to an interval of 10 milliseconds
    and includes only the time that the application is running on
    the processor (process clock time). Call stack information is
    collected by default.

    When you sample the vector PC values, you can determine the
    scalar/vector parallelism throughout your entire program. The
    collection of vector PC or CPU sampling data provides you with
    the following information:

    o  The program counter of the vector instruction

    o  The program relative timestamp

    o  The vector instruction opcode

    o  The vector stride

    o  The vector control word (instruction dependent)

    o  The vector length register

    o  The vector mask register

    o  Call stack information (optionally)

2 – Collecting Concurrent Scalar and Vector Sampling

    You can collect both scalar and vector PC samples during a
    collection run. The timer intervals must be the same for both
    types of PC sampling. If you have set different intervals
    for each, the Collector uses the timer interval of the last
    sampling command entered. The following example shows setting
    the timer interval to 20 milliseconds for CPU sampling, and 100
    milliseconds for vector CPU sampling.

  PCAC> SET CPU_SAMPLING/INTERVAL:20
  PCAC> SET VCPU_SAMPLING/INTERVAL:100

    In the example above, the interval for both CPU sampling and PC
    sampling is set to 100 milliseconds.

3 – Counting Vector Processor Instructions

    You can instruct the Collector to count all vector processor
    instructions in all or in part of an application with the SET
    VCOUNTERS command. From this information, you can determine to
    what extent the vector processor is used. You must specify at
    least one nodespec to indicate the domain of the data collected.

  PCAC> SET VCOUNTERS PROGRAM_ADDRESS BY VINSTRUCTION

    The following example shows collecting vector instruction counts
    for an entire program using the nodespec of PROGRAM ADDRESS BY
    VINSTRUCTION.

  PCAC> SET VCOUNTERS ROUTINE XYZ BY VINSTRUCTION

    The following example example shows collecting vector instruction
    counts for routine XYZ using the nodespec of ROUTINE BY
    VINSTRUCTION.

    See the Command Dictionary in the Guide to DIGITAL PCA for a
    complete list of available nodespecs with the SET VCOUNTERS
    command.

4 – Analyzing Vector Processor Data

    The Analyzer plots and displays the results of the vector
    instructions data gathered in the Collector. You can use three
    views to aid in the analysis of the scalar/vector processor
    parallelism: Table, Histogram, and Annotated Source.

    You can set the data kind to the any of the following, depending
    on what was gathered by the collection run:

    o  Vector instructions counted

    o  Vector PC sampling

    o  Vector CPU sampling

    The following additional domains are available with vector
    instruction analysis:

    o  INSTRUCTIONS-Sets the domain to be the vector instruction
       found at the sampled or counted PC.

    o  VLENGTH-Sets the domain to be the Vector Length Register (VLR)
       values

    o  VMASK-Sets the domain to be the Vector Mask Register (VMR)
       values

    o  VOPCODE-Sets the domain to be specific vector instructions

    o  VOPERATIONS-Sets the domain to be the number of operations per
       Vector instruction

    o  VREGISTERS-Sets the domain to be the Vector Register usage

    o  VSTRIDE-Sets the domain to be the Vector Stride values

5 – Finding Most Used Vector Instructions

    In the INSTRUCTION domain, to determine which vector instructions
    are used most by your program, enter the following command line:

  PCAC> PLOT/VCOUNTERS INSTRUCTION BY VOPCODE

    This command causes the report view to be based on the
    disassembled opcode for each vector instruction in the entire
    application that is sampled. The number of times a vector
    instruction is used lets you see if your application is spending
    a lot of time performing certain operations. For example, if you
    see that the SYNC vector instruction is executed more than any
    other vector instruction, you can infer that the scalar processor
    is spending too much idle time waiting for the vector processor
    to finish an operation.

6 – Finding Where Vector Instructions are Used

    To find where in your program you are using vector instructions,
    use the following command:

  PCAA> PLOT/VCOUNTERS PROGRAM_ADDRESS BY VINSTRUCTION

    This command displays the address of each vector instruction
    that is used in your program and shows what percentage of program
    execution time is spent on each instruction.

7 – System Configurations

    The following illustrates the possible system configurations and
    their effect on performance:

    o  CPU1 and CPU2 with VVIEF support:

       Efficent for program development, but can be 3-5 times slower
       than the scalar performance. Cost-effective for parallel
       applications that do not use vector processing.

    o  CPU1 - CPU2 with Vector processor:

       Efficent vector performance: As soon as a processor issues its
       first vector instruction, VMS schedules it only for vector-
       present(VP) CPU2. If the process is executing on CPU1, VMS
       swaps out and gives it to CPU2. If CPU2 is not free, the
       process waits for it to become free: VMS does not use VVIEF
       on this system.

    o  CPU1 and CPU2:

       Fatal to vector programs. They will fail when the first vector
       instruction issues and neither VVIEF nor any other vector
       processors are present.

    o  CPU1 and CPU2 with Vector processors:

       Most efficent parallel-vector performance and cost-effective.

    o  CPU1 and CPU2 - CPU3 and CPU4 with Vector processors:

       Efficent parallel-vector performance.

7.1 – VVIEF on VAX Multiprocessors

    If no vector-present CPU is available, OpenVMS executes vector
    instructions using the VAX Vector Instruction Emulator Facility
    (VVIEF), which is much slower than scalar execution.

                                   NOTE

       VVIEF must be enabled on the OpenVMS system; it is disabled
       by default. To enable VVIEF, the system manager must execute
       the command file SYS$UPDATE:VVIEF$INSTAL.COM. For more
       information, refer to your OpenVMS documentation set.