Included below are some frequently asked questions about PCA and their answers.
1 – 80% of Time Spent in P1 Space
Why is 80% of my program in P1 space? How do I get the wait time reflected in code I can change? When your program is waiting for a system service to complete, the program counter points to a location in the system service vector in P1 space. Since the most common form of system service wait is waiting for an I/O operation to complete, your program thus appears to be spending most of its time in P1 space. If your program does a lot of terminal I/O, you should expect the program to be I/O-bound and to appear to spend a lot of time in P1 space; the terminal is a slow device. If your program primarily does disk or tape I/O and appears to spend a lot of time in P1 space, you should investigate why the program is I/O- bound. By reprogramming your program's I/O to reduce the I/O wait-time, you may be able to speed up your program considerably. To get the system service wait time reflected in the code of your own program, you should gather stack PC values using the STACK_ PCS command in the Collector, then use the /MAIN_IMAGE qualifier on a PLOT or TABULATE command in the Analyzer. This will charge the time outside your image (including that spent in P1 space) to the actual location within your image that caused it to be spent.
2 – Charging Back Shareable Image Data Points
How do I get the time spent in shareable images to be charged to the parts of my program that used it? Gather STACK_PCs in the Collector, and use the /MAIN_IMAGE qualifier on your PLOT or TABULATE commands in the Analyzer. This will charge the time spent outside your image to the PC within the image that caused it to be spent.
3 – Charging Back RTL Data Points
How do I get the time spent in a specific RTL to be charged to the parts of my program that used it? Gather STACK_PCs in the Collector, then use the /MAIN_ IMAGE=SHARE$mumbleRTL and /STACK=n qualifiers on the PLOT or TABULATE commands in the Analyzer.
4 – Analyzing Individual Instructions
How can I find the specific instructions within a line that are taking the most time? Use PLOT LINE module_name\%LINE nnn BY BYTE. Then, look at a machine listing to correlate byte offsets from the beginning of the line to the specific instructions.
5 – Getting Rid of Terminal I/O
How do I get rid of all the time spent in terminal I/O? Place an event marker before each terminal I/O statement and a different event marker after the terminal I/O statement. Then use SET FILTER foo TIME <> the_first_event_marker_name in the Analyzer. This will discard all the time spent waiting for terminal I/O.
6 – ACCVIO in Program Run with PCA
Why does my program ACCVIO when linked with PCA? If the PCAC> prompt never appears, the Collector has probably not been installed as a privileged image. Possibly, the system manager forgot to edit the system start-up file to include @SYS$MANAGER:PCA$STARTUP. If the PCAC> prompt does appear, then see question 7.
7 – PCA Changes Program Behavior
Why does my program behave differently when ran with PCA? One of the following conditions probably exists: - Uninitialized stack variables - Dependence on memory above SP - Assumptions about memory allocation Cases 1 and 2 occur because PCA comes in as a handler and uses the stack above the user program's stack. Consequently, the stack is manipulated in ways that are different than when run without PCA. Although this is unlikely to happen, compiler code generation bugs have caused this sort of behavior. Case 3 above occurs because PCA now lives in the process memory space and requests memory by means of SYS$EXPREG. PCA requests a large amount of memory at initialization to minimize the altering of memory allocation, but this still may happen. PCA may have a bug where it is smashing the stack or the random user memory. HP appreciates your input because these bugs are hard to track down, and because they have been known to come and go based on the order of modules in a linker options file. HP recommends that you try the following: - Try GO/NOCOLLECT. If the program malfunctions, it's your LINK with PCA. - Try a run with simple PC sampling only. PC sampling is the least likely mode for bugs. If your program malfunctions, the problem probably lies with your program. If your program doesn't malfunction, the problem probably lies with PCA. If you are still convinced that PCA is changing the behavior of your program: If you have a support contract, contact your HP support representative. Otherwise, contact your HP account representative or your authorized reseller. When reporting a problem, please include the following information: o The versions of PCA and of the OpenVMS operating system being used. o As complete a description of the problem as possible, trying not to overlook any details. o The problem reduced to as small a size as possible. o If the problem is with the collector: - Does the program run GO/NOCOLLECT? - Does the program run with the OpenVMS debugger? - If Counters, Coverage, or Events are involved, does the program behave properly when breakpoints are put in the same locations with the OpenVMS debugger? - Please supply the version of the compiler(s) used. - Files needed to build the program, including build procedures. o If the problem is with the Analyzer: - The .PCA file involved - The sources referenced by the .PCA file - A PCA initialization file to reproduce the problem. o All file should be submitted on machine-readable media (magnetic tape `*preferred`*, floppy diskette, or tape cassette). o Any stack dumps the occurred, if applicable.
8 – Creating Data File Takes Very Long
Why does it take so long to create a performance data file? The Collector copies the portions of the DST it needs to the performance data file. This can take some time for large programs. The DST is placed in the performance data file to avoid confusion over which image contains the DST for the data gathered. Also, PCA does not need all the information in the DST and condenses it. This avoids the overhead of reading useless information every time the file is used.
9 – Using PCA in a Batch Process
Will PCA run in batch? Yes. However, you should avoid using screen mode.
10 – Avoiding Recompilation
My application takes 10 days to compile. Is there a way I can avoid compiling my whole application with /DEBUG? Yes. PCA will provide all functionality except annotated source listings and codepath analysis, as long as the objects contain traceback information because most of the DST information that PCA needs is there. Once you find which modules are of interest, you can compile those with /DEBUG, then relink the application and gather the data again.
11 – CPU Time Stamp Explained
What exactly is a CPU time stamp? The time stamp found in the PCA performance data file always expresses the CPU time from the start of the current program execution. In the data file, it is represented as 10-millisecond increments (number of CPU ticks), but to the user, it is always presented as milliseconds. This CPU time represents the total amount of CPU time consumed by the program and by the Collector from the time the program started executing.
12 – CALL Instruction Gets a Spike
PCA tells me that a large amount of time is being spent at a CALL instruction. Why? The CALL instruction should only consume a small part of the time spent executing the routine. First, check page faulting. Sometimes the faulting behavior of a program causes a moderately called routine to get paged out just before it is called. If that isn't the case, check for JSB linkages to an RTL routine. For performance reasons, some RTL routines use JSB linkages. This can cause confusion for the user when the /MAIN_IMAGE qualifier is used. This is especially true with PC sampling data, but can occur with any kind of data for which you can gather stack PC data. Because a JSB linkage does not place a call frame on the stack, the return address to the site of the call is lost to PCA. Consequently, the first return address found by /MAIN_IMAGE is the site of the call to the routine that called the RTL by means of a JSB linkage. As an example, suppose routine MAIN called routine FOO which in turn called the RTL via a JSB linkage. Then, suppose that a PC sampling hit occurred in the RTL. This will cause the PC of the call to FOO and the PC of the call to MAIN to be recorded. Thus, in the presence of the /MAIN_IMAGE qualifier, the first PC within the image is the PC of the call to FOO. Consequently, FOO's call site will be inflated by the number of data points in the RTL that are in routines which have JSB linkages. Note that the above can yield useful information. If you compare the time with /MAIN to the time without /MAIN, you can tell how much time was spent in JSB linkage routines. You cannot, however, separate the various JSB linkage routines. Note further that if the JSB routine is called from the main program, the data points will be lost because there is no caller of the main program.
13 – 0.0% With ******** in Plots
Why does the Analyzer report 0.0% for a line and then output a full line of stars, indicating that the line was covered? Probably the total number of data points is over 2000, and the percentage is less than 0.05%. Therefore, rounding makes it 0.0%.
14 – Optimizing Calls To Utility Routines
I have a utility routine which I have optimized as much as I can. I need to know who is calling it and how often, so I can reduce the number of calls to it. How do I get this information? Use /MAIN_IMAGE=utility-routine on the plot command. This gives you the following options: - PLOT CALL_TREE BY CHAIN_ROUTINE will list all the call chains which pass through utility-routine with the number of data points for each call chain. - PLOT/STACK=1 PROGRAM BY ROUTINE will list all the callers of utility-routine. - If one particular caller of utility-routine is of interest, try the following: PCAA> SET FILTER filter-name CHAIN=(*,caller,utility-routine,*) This will assure that the data being viewed is only of those whose chains have a subchain caller,utility-routine. - Many other combinations of /CUMULATIVE, /MAIN_IMAGE, /STACK with various filters and nodespecs may be useful.
15 – FAULT ADDRESS Data Kind
What information does the /FAULT_ADDRESS data kind provide? When a page fault occurs, two virtual addresses are gathered: the PC of the instruction and the virtual address which caused the fault. The CPU time is also gathered. In general, the PC which caused the fault, i.e., the /PAGEFAULT data kind, is most significant because PCA can plot this against the PROGRAM_ADDRESS domain and show where the page faulting is occurring in your program. The /FAULT_ADDRESS data kind can also be plotted against the PROGRAM_ADDRESS domain to find where the page faulting is occurring. This can be useful in laying out CLUSTERs for the link. (Note that these same page faults should also show up at the branch or call instructions when plotting /PAGEFAULT.)
16 – Stack PC Data and Page Fault Data
Why can't I do stack PC analysis with page fault data? While the Collector is gathering page fault data, walking the stack might cause additional page faults. This problem has not been addressed for the current release of OpenVMS.
17 – Routines and Coverage Data
Why am I getting several coverage data points associated with my routine declaration when I do COVERAGE BY CODEPATH? Several languages generate prologue code at each routine entry to initialize the language specific semantics. As far as PCA is concerned, code is code and deserves code path analysis. This environment is usually set up by a CALLS or JSB to an RTL routine. PCA considers CALLS, CALLG and JSB to be transfers of control, because control does not in principle have to come back, and places a BPT in the instruction following.
18 – HEX Numbers in CALLTREE Plots
Why are HEX numbers showing up in the CALL_TREE plot? The Analyzer was not able to symbolize the return address it found in the call stack. If IO_SERVICES or SERVICES data is being gathered, these may be addresses in the relocated system service vector.
19 – Getting Right To Hot Spots
How can I avoid all the source header information and get right to the most interesting line? Use the traverse commands, NEXT, FIRST, PREVIOUS, and CURRENT.
20 – Comparing Different Kinds of Data
How can I easily compare different kinds of data? Use the INCLUDE command.
21 – Virtual Memory in the Analyzer
The Analyzer is running out of virtual memory. What do I do? Raise the appropriate quotas, limit the number of displays, and limit the memory used by displays (use /SIZE=n). Limit the size of your plots with the following methods: - Use limiting nodespecs. For example, if PROGRAM_ADDRESS BY LINE doesn't work, try MODULE foo BY LINE, or ROUTINE fee BY LINE. - Use the traverse commands after issuing PLOT/your_qualifiers PROGRAM_ADDRESS BY MODULE - Use the /NOZEROS, /MINIMUM, /MAXIMUM qualifiers. - Use filters with CALL_TREE nodespecs to reduce the number of call chains.
22 – Virtual Memory in the Collector
The Collector is running out of virtual memory. What do I do? If you are doing coverage or counter analysis, limit the number of breakpoint settings by using either MODULE BY LINE or BY CODEPATH node specifications instead of using PROGRAM_ADDRESS BY LINE or BY CODEPATH. Then, do several collection runs to gather the data.
23 – Some Plots Appear So Quickly
Why do some plots execute more quickly than others? Some PLOT commands execute more quickly than others because PCA uses all available information from the previous plot to produce the requested one. For example, if you enter PLOT PROGRAM BY LINE, and then enter PLOT/DESCENDING, PCA will only sort the previous plot. However, if you use a different nodespec, such as PLOT ROUTINE foo$bar BY CODEPATH, then PCA must rebuild its internal tables and read the data again, which takes more time. In addition, the number of filters and/or buckets you use affects the time it takes to build a plot. This is because filters affect the amount of data the Analyzer looks at, and because all buckets must be searched for each data point.
24 – Missing Subroutine Calls
Why don't I see all of my subroutine calls in a CALL_TREE plot? It may be that your routine has a JSB linkage. See question 7.
25 – Bad Offsets In MACRO Modules
Why do I get bad offsets when plotting MACRO modules by byte? When plotting MACRO modules by byte, the offset is actually from the beginning of the module, including the data psects. You get bad offsets because the linker moves the psects around based on the psect attributes. Thus, the offsets you get may have no relationship to any listing you have. However, if you use PLOT ROUTINE foo BY BYTE, then the offsets will be from the beginning of the routine. (This will only work if you have an .ENTRY foo ... directive in your program.)
26 – LIB$FIND_IMAGE_SYMBOL
Can PCA measure shareable images activated "on the fly" with LIB$FIND_IMAGE_SYMBOL? Yes, if you relink against the image you want to activate. PCA uses a structure built by the image activator to find all the shareable image information it needs. By relinking the image, the image activator will know about the image and LIB$FIND_IMAGE_ SYMBOL will work.