Library /sys$common/syshlp/PCA$HELP.HLB

    Included below are some frequently asked questions about PCA and
    their answers.

1 – 80% of Time Spent in P1 Space

    Why is 80% of my program in P1 space? How do I get the wait time
    reflected in code I can change?

    When your program is waiting for a system service to complete,
    the program counter points to a location in the system service
    vector in P1 space. Since the most common form of system service
    wait is waiting for an I/O operation to complete, your program
    thus appears to be spending most of its time in P1 space.

    If your program does a lot of terminal I/O, you should expect the
    program to be I/O-bound and to appear to spend a lot of time
    in P1 space; the terminal is a slow device. If your program
    primarily does disk or tape I/O and appears to spend a lot of
    time in P1 space, you should investigate why the program is I/O-
    bound. By reprogramming your program's I/O to reduce the I/O
    wait-time, you may be able to speed up your program considerably.

    To get the system service wait time reflected in the code of your
    own program, you should gather stack PC values using the STACK_
    PCS command in the Collector, then use the /MAIN_IMAGE qualifier
    on a PLOT or TABULATE command in the Analyzer. This will charge
    the time outside your image (including that spent in P1 space)
    to the actual location within your image that caused it to be
    spent.

2 – Charging Back Shareable Image Data Points

    How do I get the time spent in shareable images to be charged to
    the parts of my program that used it?

    Gather STACK_PCs in the Collector, and use the /MAIN_IMAGE
    qualifier on your PLOT or TABULATE commands in the Analyzer. This
    will charge the time spent outside your image to the PC within
    the image that caused it to be spent.

3 – Charging Back RTL Data Points

    How do I get the time spent in a specific RTL to be charged to
    the parts of my program that used it?

    Gather STACK_PCs in the Collector, then use the /MAIN_
    IMAGE=SHARE$mumbleRTL and /STACK=n qualifiers on the PLOT or
    TABULATE commands in the Analyzer.

4 – Analyzing Individual Instructions

    How can I find the specific instructions within a line that are
    taking the most time?

    Use PLOT LINE module_name\%LINE nnn BY BYTE. Then, look at a
    machine listing to correlate byte offsets from the beginning of
    the line to the specific instructions.

5 – Getting Rid of Terminal I/O

    How do I get rid of all the time spent in terminal I/O?

    Place an event marker before each terminal I/O statement and
    a different event marker after the terminal I/O statement.
    Then use SET FILTER foo TIME <> the_first_event_marker_name in
    the Analyzer. This will discard all the time spent waiting for
    terminal I/O.

6 – ACCVIO in Program Run with PCA

    Why does my program ACCVIO when linked with PCA?

    If the PCAC> prompt never appears, the Collector has probably
    not been installed as a privileged image. Possibly, the system
    manager forgot to edit the system start-up file to include
    @SYS$MANAGER:PCA$STARTUP. If the PCAC> prompt does appear, then
    see question 7.

7 – PCA Changes Program Behavior

    Why does my program behave differently when ran with PCA?

    One of the following conditions probably exists:

    -  Uninitialized stack variables

    -  Dependence on memory above SP

    -  Assumptions about memory allocation

    Cases 1 and 2 occur because PCA comes in as a handler and uses
    the stack above the user program's stack. Consequently, the
    stack is manipulated in ways that are different than when run
    without PCA. Although this is unlikely to happen, compiler code
    generation bugs have caused this sort of behavior.

    Case 3 above occurs because PCA now lives in the process memory
    space and requests memory by means of SYS$EXPREG. PCA requests a
    large amount of memory at initialization to minimize the altering
    of memory allocation, but this still may happen.

    PCA may have a bug where it is smashing the stack or the random
    user memory. HP appreciates your input because these bugs are
    hard to track down, and because they have been known to come and
    go based on the order of modules in a linker options file.

    HP recommends that you try the following:

    -  Try GO/NOCOLLECT. If the program malfunctions, it's your LINK
       with PCA.

    -  Try a run with simple PC sampling only. PC sampling is the
       least likely mode for bugs. If your program malfunctions,
       the problem probably lies with your program. If your program
       doesn't malfunction, the problem probably lies with PCA.

    If you are still convinced that PCA is changing the behavior of
    your program:

    If you have a support contract, contact your HP support
    representative. Otherwise, contact your HP account representative
    or your authorized reseller.

    When reporting a problem, please include the following
    information:

    o  The versions of PCA and of the OpenVMS operating system being
       used.

    o  As complete a description of the problem as possible, trying
       not to overlook any details.

    o  The problem reduced to as small a size as possible.

    o  If the problem is with the collector:

       -  Does the program run GO/NOCOLLECT?

       -  Does the program run with the OpenVMS debugger?

       -  If Counters, Coverage, or Events are involved, does the
          program behave properly when breakpoints are put in the
          same locations with the OpenVMS debugger?

       -  Please supply the version of the compiler(s) used.

       -  Files needed to build the program, including build
          procedures.

    o  If the problem is with the Analyzer:

       -  The .PCA file involved

       -  The sources referenced by the .PCA file

       -  A PCA initialization file to reproduce the problem.

    o  All file should be submitted on machine-readable media
       (magnetic tape `*preferred`*, floppy diskette, or tape
       cassette).

    o  Any stack dumps the occurred, if applicable.

8 – Creating Data File Takes Very Long

    Why does it take so long to create a performance data file?

    The Collector copies the portions of the DST it needs to the
    performance data file. This can take some time for large
    programs. The DST is placed in the performance data file to
    avoid confusion over which image contains the DST for the data
    gathered. Also, PCA does not need all the information in the DST
    and condenses it. This avoids the overhead of reading useless
    information every time the file is used.

9 – Using PCA in a Batch Process

    Will PCA run in batch?

    Yes. However, you should avoid using screen mode.

10 – Avoiding Recompilation

    My application takes 10 days to compile. Is there a way I can
    avoid compiling my whole application with /DEBUG?

    Yes. PCA will provide all functionality except annotated source
    listings and codepath analysis, as long as the objects contain
    traceback information because most of the DST information that
    PCA needs is there. Once you find which modules are of interest,
    you can compile those with /DEBUG, then relink the application
    and gather the data again.

11 – CPU Time Stamp Explained

    What exactly is a CPU time stamp?

    The time stamp found in the PCA performance data file always
    expresses the CPU time from the start of the current program
    execution. In the data file, it is represented as 10-millisecond
    increments (number of CPU ticks), but to the user, it is always
    presented as milliseconds. This CPU time represents the total
    amount of CPU time consumed by the program and by the Collector
    from the time the program started executing.

12 – CALL Instruction Gets a Spike

    PCA tells me that a large amount of time is being spent at a
    CALL instruction. Why? The CALL instruction should only consume a
    small part of the time spent executing the routine.

    First, check page faulting. Sometimes the faulting behavior of
    a program causes a moderately called routine to get paged out
    just before it is called. If that isn't the case, check for JSB
    linkages to an RTL routine.

    For performance reasons, some RTL routines use JSB linkages. This
    can cause confusion for the user when the /MAIN_IMAGE qualifier
    is used. This is especially true with PC sampling data, but can
    occur with any kind of data for which you can gather stack PC
    data.

    Because a JSB linkage does not place a call frame on the stack,
    the return address to the site of the call is lost to PCA.
    Consequently, the first return address found by /MAIN_IMAGE is
    the site of the call to the routine that called the RTL by means
    of a JSB linkage. As an example, suppose routine MAIN called
    routine FOO which in turn called the RTL via a JSB linkage.
    Then, suppose that a PC sampling hit occurred in the RTL. This
    will cause the PC of the call to FOO and the PC of the call to
    MAIN to be recorded. Thus, in the presence of the /MAIN_IMAGE
    qualifier, the first PC within the image is the PC of the call to
    FOO. Consequently, FOO's call site will be inflated by the number
    of data points in the RTL that are in routines which have JSB
    linkages.

    Note that the above can yield useful information. If you compare
    the time with /MAIN to the time without /MAIN, you can tell how
    much time was spent in JSB linkage routines. You cannot, however,
    separate the various JSB linkage routines. Note further that if
    the JSB routine is called from the main program, the data points
    will be lost because there is no caller of the main program.

13 – 0.0% With ******** in Plots

    Why does the Analyzer report 0.0% for a line and then output a
    full line of stars, indicating that the line was covered?

    Probably the total number of data points is over 2000, and the
    percentage is less than 0.05%. Therefore, rounding makes it
    0.0%.

14 – Optimizing Calls To Utility Routines

    I have a utility routine which I have optimized as much as I can.
    I need to know who is calling it and how often, so I can reduce
    the number of calls to it. How do I get this information?

    Use /MAIN_IMAGE=utility-routine on the plot command. This gives
    you the following options:

    -  PLOT CALL_TREE BY CHAIN_ROUTINE will list all the call chains
       which pass through utility-routine with the number of data
       points for each call chain.

    -  PLOT/STACK=1 PROGRAM BY ROUTINE will list all the callers of
       utility-routine.

    -  If one particular caller of utility-routine is of interest,
       try the following:

  PCAA> SET FILTER filter-name CHAIN=(*,caller,utility-routine,*)

    This will assure that the data being viewed is only of those
    whose chains have a subchain caller,utility-routine.

    -  Many other combinations of /CUMULATIVE, /MAIN_IMAGE, /STACK
       with various filters and nodespecs may be useful.

15 – FAULT ADDRESS Data Kind

    What information does the /FAULT_ADDRESS data kind provide?

    When a page fault occurs, two virtual addresses are gathered:
    the PC of the instruction and the virtual address which caused
    the fault. The CPU time is also gathered. In general, the PC
    which caused the fault, i.e., the /PAGEFAULT data kind, is most
    significant because PCA can plot this against the PROGRAM_ADDRESS
    domain and show where the page faulting is occurring in your
    program.

    The /FAULT_ADDRESS data kind can also be plotted against the
    PROGRAM_ADDRESS domain to find where the page faulting is
    occurring. This can be useful in laying out CLUSTERs for the
    link. (Note that these same page faults should also show up at
    the branch or call instructions when plotting /PAGEFAULT.)

16 – Stack PC Data and Page Fault Data

    Why can't I do stack PC analysis with page fault data?

    While the Collector is gathering page fault data, walking the
    stack might cause additional page faults. This problem has not
    been addressed for the current release of OpenVMS.

17 – Routines and Coverage Data

    Why am I getting several coverage data points associated with my
    routine declaration when I do COVERAGE BY CODEPATH?

    Several languages generate prologue code at each routine entry
    to initialize the language specific semantics. As far as PCA
    is concerned, code is code and deserves code path analysis.
    This environment is usually set up by a CALLS or JSB to an RTL
    routine. PCA considers CALLS, CALLG and JSB to be transfers of
    control, because control does not in principle have to come back,
    and places a BPT in the instruction following.

18 – HEX Numbers in CALLTREE Plots

    Why are HEX numbers showing up in the CALL_TREE plot?

    The Analyzer was not able to symbolize the return address it
    found in the call stack. If IO_SERVICES or SERVICES data is being
    gathered, these may be addresses in the relocated system service
    vector.

19 – Getting Right To Hot Spots

    How can I avoid all the source header information and get right
    to the most interesting line?

    Use the traverse commands, NEXT, FIRST, PREVIOUS, and CURRENT.

20 – Comparing Different Kinds of Data

    How can I easily compare different kinds of data?

    Use the INCLUDE command.

21 – Virtual Memory in the Analyzer

    The Analyzer is running out of virtual memory. What do I do?

    Raise the appropriate quotas, limit the number of displays, and
    limit the memory used by displays (use /SIZE=n).

    Limit the size of your plots with the following methods:

    -  Use limiting nodespecs. For example, if PROGRAM_ADDRESS BY
       LINE doesn't work, try MODULE foo BY LINE, or ROUTINE fee BY
       LINE.

    -  Use the traverse commands after issuing PLOT/your_qualifiers
       PROGRAM_ADDRESS BY MODULE

    -  Use the /NOZEROS, /MINIMUM, /MAXIMUM qualifiers.

    -  Use filters with CALL_TREE nodespecs to reduce the number of
       call chains.

22 – Virtual Memory in the Collector

    The Collector is running out of virtual memory. What do I do?

    If you are doing coverage or counter analysis, limit the number
    of breakpoint settings by using either MODULE BY LINE or BY
    CODEPATH node specifications instead of using PROGRAM_ADDRESS BY
    LINE or BY CODEPATH. Then, do several collection runs to gather
    the data.

23 – Some Plots Appear So Quickly

    Why do some plots execute more quickly than others?

    Some PLOT commands execute more quickly than others because PCA
    uses all available information from the previous plot to produce
    the requested one. For example, if you enter PLOT PROGRAM BY
    LINE, and then enter PLOT/DESCENDING, PCA will only sort the
    previous plot. However, if you use a different nodespec, such
    as PLOT ROUTINE foo$bar BY CODEPATH, then PCA must rebuild its
    internal tables and read the data again, which takes more time.
    In addition, the number of filters and/or buckets you use affects
    the time it takes to build a plot. This is because filters affect
    the amount of data the Analyzer looks at, and because all buckets
    must be searched for each data point.

24 – Missing Subroutine Calls

    Why don't I see all of my subroutine calls in a CALL_TREE plot?

    It may be that your routine has a JSB linkage. See question 7.

25 – Bad Offsets In MACRO Modules

    Why do I get bad offsets when plotting MACRO modules by byte?

    When plotting MACRO modules by byte, the offset is actually from
    the beginning of the module, including the data psects. You get
    bad offsets because the linker moves the psects around based
    on the psect attributes. Thus, the offsets you get may have no
    relationship to any listing you have. However, if you use PLOT
    ROUTINE foo BY BYTE, then the offsets will be from the beginning
    of the routine. (This will only work if you have an .ENTRY foo
    ... directive in your program.)

26 – LIB$FIND_IMAGE_SYMBOL

    Can PCA measure shareable images activated "on the fly" with
    LIB$FIND_IMAGE_SYMBOL?

    Yes, if you relink against the image you want to activate. PCA
    uses a structure built by the image activator to find all the
    shareable image information it needs. By relinking the image,
    the image activator will know about the image and LIB$FIND_IMAGE_
    SYMBOL will work.