Included below are some frequently asked questions about PCA and
their answers.
1 – 80% of Time Spent in P1 Space
Why is 80% of my program in P1 space? How do I get the wait time
reflected in code I can change?
When your program is waiting for a system service to complete,
the program counter points to a location in the system service
vector in P1 space. Since the most common form of system service
wait is waiting for an I/O operation to complete, your program
thus appears to be spending most of its time in P1 space.
If your program does a lot of terminal I/O, you should expect the
program to be I/O-bound and to appear to spend a lot of time
in P1 space; the terminal is a slow device. If your program
primarily does disk or tape I/O and appears to spend a lot of
time in P1 space, you should investigate why the program is I/O-
bound. By reprogramming your program's I/O to reduce the I/O
wait-time, you may be able to speed up your program considerably.
To get the system service wait time reflected in the code of your
own program, you should gather stack PC values using the STACK_
PCS command in the Collector, then use the /MAIN_IMAGE qualifier
on a PLOT or TABULATE command in the Analyzer. This will charge
the time outside your image (including that spent in P1 space)
to the actual location within your image that caused it to be
spent.
2 – Charging Back Shareable Image Data Points
How do I get the time spent in shareable images to be charged to
the parts of my program that used it?
Gather STACK_PCs in the Collector, and use the /MAIN_IMAGE
qualifier on your PLOT or TABULATE commands in the Analyzer. This
will charge the time spent outside your image to the PC within
the image that caused it to be spent.
3 – Charging Back RTL Data Points
How do I get the time spent in a specific RTL to be charged to
the parts of my program that used it?
Gather STACK_PCs in the Collector, then use the /MAIN_
IMAGE=SHARE$mumbleRTL and /STACK=n qualifiers on the PLOT or
TABULATE commands in the Analyzer.
4 – Analyzing Individual Instructions
How can I find the specific instructions within a line that are
taking the most time?
Use PLOT LINE module_name\%LINE nnn BY BYTE. Then, look at a
machine listing to correlate byte offsets from the beginning of
the line to the specific instructions.
5 – Getting Rid of Terminal I/O
How do I get rid of all the time spent in terminal I/O?
Place an event marker before each terminal I/O statement and
a different event marker after the terminal I/O statement.
Then use SET FILTER foo TIME <> the_first_event_marker_name in
the Analyzer. This will discard all the time spent waiting for
terminal I/O.
6 – ACCVIO in Program Run with PCA
Why does my program ACCVIO when linked with PCA?
If the PCAC> prompt never appears, the Collector has probably
not been installed as a privileged image. Possibly, the system
manager forgot to edit the system start-up file to include
@SYS$MANAGER:PCA$STARTUP. If the PCAC> prompt does appear, then
see question 7.
7 – PCA Changes Program Behavior
Why does my program behave differently when ran with PCA?
One of the following conditions probably exists:
- Uninitialized stack variables
- Dependence on memory above SP
- Assumptions about memory allocation
Cases 1 and 2 occur because PCA comes in as a handler and uses
the stack above the user program's stack. Consequently, the
stack is manipulated in ways that are different than when run
without PCA. Although this is unlikely to happen, compiler code
generation bugs have caused this sort of behavior.
Case 3 above occurs because PCA now lives in the process memory
space and requests memory by means of SYS$EXPREG. PCA requests a
large amount of memory at initialization to minimize the altering
of memory allocation, but this still may happen.
PCA may have a bug where it is smashing the stack or the random
user memory. HP appreciates your input because these bugs are
hard to track down, and because they have been known to come and
go based on the order of modules in a linker options file.
HP recommends that you try the following:
- Try GO/NOCOLLECT. If the program malfunctions, it's your LINK
with PCA.
- Try a run with simple PC sampling only. PC sampling is the
least likely mode for bugs. If your program malfunctions,
the problem probably lies with your program. If your program
doesn't malfunction, the problem probably lies with PCA.
If you are still convinced that PCA is changing the behavior of
your program:
If you have a support contract, contact your HP support
representative. Otherwise, contact your HP account representative
or your authorized reseller.
When reporting a problem, please include the following
information:
o The versions of PCA and of the OpenVMS operating system being
used.
o As complete a description of the problem as possible, trying
not to overlook any details.
o The problem reduced to as small a size as possible.
o If the problem is with the collector:
- Does the program run GO/NOCOLLECT?
- Does the program run with the OpenVMS debugger?
- If Counters, Coverage, or Events are involved, does the
program behave properly when breakpoints are put in the
same locations with the OpenVMS debugger?
- Please supply the version of the compiler(s) used.
- Files needed to build the program, including build
procedures.
o If the problem is with the Analyzer:
- The .PCA file involved
- The sources referenced by the .PCA file
- A PCA initialization file to reproduce the problem.
o All file should be submitted on machine-readable media
(magnetic tape `*preferred`*, floppy diskette, or tape
cassette).
o Any stack dumps the occurred, if applicable.
8 – Creating Data File Takes Very Long
Why does it take so long to create a performance data file?
The Collector copies the portions of the DST it needs to the
performance data file. This can take some time for large
programs. The DST is placed in the performance data file to
avoid confusion over which image contains the DST for the data
gathered. Also, PCA does not need all the information in the DST
and condenses it. This avoids the overhead of reading useless
information every time the file is used.
9 – Using PCA in a Batch Process
Will PCA run in batch?
Yes. However, you should avoid using screen mode.
10 – Avoiding Recompilation
My application takes 10 days to compile. Is there a way I can
avoid compiling my whole application with /DEBUG?
Yes. PCA will provide all functionality except annotated source
listings and codepath analysis, as long as the objects contain
traceback information because most of the DST information that
PCA needs is there. Once you find which modules are of interest,
you can compile those with /DEBUG, then relink the application
and gather the data again.
11 – CPU Time Stamp Explained
What exactly is a CPU time stamp?
The time stamp found in the PCA performance data file always
expresses the CPU time from the start of the current program
execution. In the data file, it is represented as 10-millisecond
increments (number of CPU ticks), but to the user, it is always
presented as milliseconds. This CPU time represents the total
amount of CPU time consumed by the program and by the Collector
from the time the program started executing.
12 – CALL Instruction Gets a Spike
PCA tells me that a large amount of time is being spent at a
CALL instruction. Why? The CALL instruction should only consume a
small part of the time spent executing the routine.
First, check page faulting. Sometimes the faulting behavior of
a program causes a moderately called routine to get paged out
just before it is called. If that isn't the case, check for JSB
linkages to an RTL routine.
For performance reasons, some RTL routines use JSB linkages. This
can cause confusion for the user when the /MAIN_IMAGE qualifier
is used. This is especially true with PC sampling data, but can
occur with any kind of data for which you can gather stack PC
data.
Because a JSB linkage does not place a call frame on the stack,
the return address to the site of the call is lost to PCA.
Consequently, the first return address found by /MAIN_IMAGE is
the site of the call to the routine that called the RTL by means
of a JSB linkage. As an example, suppose routine MAIN called
routine FOO which in turn called the RTL via a JSB linkage.
Then, suppose that a PC sampling hit occurred in the RTL. This
will cause the PC of the call to FOO and the PC of the call to
MAIN to be recorded. Thus, in the presence of the /MAIN_IMAGE
qualifier, the first PC within the image is the PC of the call to
FOO. Consequently, FOO's call site will be inflated by the number
of data points in the RTL that are in routines which have JSB
linkages.
Note that the above can yield useful information. If you compare
the time with /MAIN to the time without /MAIN, you can tell how
much time was spent in JSB linkage routines. You cannot, however,
separate the various JSB linkage routines. Note further that if
the JSB routine is called from the main program, the data points
will be lost because there is no caller of the main program.
13 – 0.0% With ******** in Plots
Why does the Analyzer report 0.0% for a line and then output a
full line of stars, indicating that the line was covered?
Probably the total number of data points is over 2000, and the
percentage is less than 0.05%. Therefore, rounding makes it
0.0%.
14 – Optimizing Calls To Utility Routines
I have a utility routine which I have optimized as much as I can.
I need to know who is calling it and how often, so I can reduce
the number of calls to it. How do I get this information?
Use /MAIN_IMAGE=utility-routine on the plot command. This gives
you the following options:
- PLOT CALL_TREE BY CHAIN_ROUTINE will list all the call chains
which pass through utility-routine with the number of data
points for each call chain.
- PLOT/STACK=1 PROGRAM BY ROUTINE will list all the callers of
utility-routine.
- If one particular caller of utility-routine is of interest,
try the following:
PCAA> SET FILTER filter-name CHAIN=(*,caller,utility-routine,*)
This will assure that the data being viewed is only of those
whose chains have a subchain caller,utility-routine.
- Many other combinations of /CUMULATIVE, /MAIN_IMAGE, /STACK
with various filters and nodespecs may be useful.
15 – FAULT ADDRESS Data Kind
What information does the /FAULT_ADDRESS data kind provide?
When a page fault occurs, two virtual addresses are gathered:
the PC of the instruction and the virtual address which caused
the fault. The CPU time is also gathered. In general, the PC
which caused the fault, i.e., the /PAGEFAULT data kind, is most
significant because PCA can plot this against the PROGRAM_ADDRESS
domain and show where the page faulting is occurring in your
program.
The /FAULT_ADDRESS data kind can also be plotted against the
PROGRAM_ADDRESS domain to find where the page faulting is
occurring. This can be useful in laying out CLUSTERs for the
link. (Note that these same page faults should also show up at
the branch or call instructions when plotting /PAGEFAULT.)
16 – Stack PC Data and Page Fault Data
Why can't I do stack PC analysis with page fault data?
While the Collector is gathering page fault data, walking the
stack might cause additional page faults. This problem has not
been addressed for the current release of OpenVMS.
17 – Routines and Coverage Data
Why am I getting several coverage data points associated with my
routine declaration when I do COVERAGE BY CODEPATH?
Several languages generate prologue code at each routine entry
to initialize the language specific semantics. As far as PCA
is concerned, code is code and deserves code path analysis.
This environment is usually set up by a CALLS or JSB to an RTL
routine. PCA considers CALLS, CALLG and JSB to be transfers of
control, because control does not in principle have to come back,
and places a BPT in the instruction following.
18 – HEX Numbers in CALLTREE Plots
Why are HEX numbers showing up in the CALL_TREE plot?
The Analyzer was not able to symbolize the return address it
found in the call stack. If IO_SERVICES or SERVICES data is being
gathered, these may be addresses in the relocated system service
vector.
19 – Getting Right To Hot Spots
How can I avoid all the source header information and get right
to the most interesting line?
Use the traverse commands, NEXT, FIRST, PREVIOUS, and CURRENT.
20 – Comparing Different Kinds of Data
How can I easily compare different kinds of data?
Use the INCLUDE command.
21 – Virtual Memory in the Analyzer
The Analyzer is running out of virtual memory. What do I do?
Raise the appropriate quotas, limit the number of displays, and
limit the memory used by displays (use /SIZE=n).
Limit the size of your plots with the following methods:
- Use limiting nodespecs. For example, if PROGRAM_ADDRESS BY
LINE doesn't work, try MODULE foo BY LINE, or ROUTINE fee BY
LINE.
- Use the traverse commands after issuing PLOT/your_qualifiers
PROGRAM_ADDRESS BY MODULE
- Use the /NOZEROS, /MINIMUM, /MAXIMUM qualifiers.
- Use filters with CALL_TREE nodespecs to reduce the number of
call chains.
22 – Virtual Memory in the Collector
The Collector is running out of virtual memory. What do I do?
If you are doing coverage or counter analysis, limit the number
of breakpoint settings by using either MODULE BY LINE or BY
CODEPATH node specifications instead of using PROGRAM_ADDRESS BY
LINE or BY CODEPATH. Then, do several collection runs to gather
the data.
23 – Some Plots Appear So Quickly
Why do some plots execute more quickly than others?
Some PLOT commands execute more quickly than others because PCA
uses all available information from the previous plot to produce
the requested one. For example, if you enter PLOT PROGRAM BY
LINE, and then enter PLOT/DESCENDING, PCA will only sort the
previous plot. However, if you use a different nodespec, such
as PLOT ROUTINE foo$bar BY CODEPATH, then PCA must rebuild its
internal tables and read the data again, which takes more time.
In addition, the number of filters and/or buckets you use affects
the time it takes to build a plot. This is because filters affect
the amount of data the Analyzer looks at, and because all buckets
must be searched for each data point.
24 – Missing Subroutine Calls
Why don't I see all of my subroutine calls in a CALL_TREE plot?
It may be that your routine has a JSB linkage. See question 7.
25 – Bad Offsets In MACRO Modules
Why do I get bad offsets when plotting MACRO modules by byte?
When plotting MACRO modules by byte, the offset is actually from
the beginning of the module, including the data psects. You get
bad offsets because the linker moves the psects around based
on the psect attributes. Thus, the offsets you get may have no
relationship to any listing you have. However, if you use PLOT
ROUTINE foo BY BYTE, then the offsets will be from the beginning
of the routine. (This will only work if you have an .ENTRY foo
... directive in your program.)
26 – LIB$FIND_IMAGE_SYMBOL
Can PCA measure shareable images activated "on the fly" with
LIB$FIND_IMAGE_SYMBOL?
Yes, if you relink against the image you want to activate. PCA
uses a structure built by the image activator to find all the
shareable image information it needs. By relinking the image,
the image activator will know about the image and LIB$FIND_IMAGE_
SYMBOL will work.