Use the following:
ja
a.out
ja -clst > acctfile Options - cl = command summary, s = summary,t = terminate
On the basis of information gathered from this report, one may decide which area of the program to concentrate on first. If I/O or memory activity is high, the next step should be to investigate this further with procview. If the code is CPU-bound, it's probably best to move on to profview and flowview, to further examine CPU activity.
Example
Job Accounting - Summary Report
===============================
Job Accounting File Name : /tmp/nqs.+++++03av/.jacct4001
Operating System : sn5227 sn5227 7.0.4.2 roo.5 CRAY Y-MP
User Name (ID) : ccg0002 (1474)
Group Name (ID) : cc (901)
Account Name (ID) : centre (13)
Job Name (ID) : scrip (4001)
Report Starts : 10/11/94 13:40:11
Report Ends : 10/11/94 13:40:18
Elapsed Time : 7 Seconds
User CPU Time : 12.4072 [ 12.4067] Seconds
Multitasking Breakdown
(Concurrent CPUs * Connect seconds = CPU seconds)
--------------- --------------- -----------
1 * 0.1625 = 0.1625
2 * 1.9370 = 3.8740
3 * 2.4148 = 7.2445
(Concurrent CPUs * Connect seconds = CPU seconds)
(Avg.) (total) (total)
-------------------------------------------
2.50 * 4.5143 = 11.2809
System CPU Time : 0.7885 Seconds
I/O Wait Time (Locked) : 1.1013 Seconds
I/O Wait Time (Unlocked) : 0.2744 Seconds
CPU Time Memory Integral : 0.8239 Mword-seconds
SDS Time Memory Integral : 0.0000 Mword-seconds
I/O Wait Time Memory Integral : 0.1312 Mword-seconds
Data Transferred : 0.2458 MWords
Maximum memory used : 0.1406 MWords
Logical I/O Requests : 122
Physical I/O Requests : 97
Number of Commands : 2
Billing Units : 0.0000
setenv NCPUS 1
procstat -R proc.raw a.out
procview -L -Sn proc.raw > proc.report
This produces a report in the file proc.report.
Options:
-R = writes rawfile for procview to create reports
-L = prevents execution of interactive X Windows
-Sn = generates one or more system report files sorted by name
Note:
autotasking is not allowed (NCPUS=1).
Recompilation and reloading are not necessary
There is very little CPU overhead
The procview report shows, among other things, how many bytes were read and written in each file, and the rate at which they were processed. With this knowledge, we can use the assign statement (explained later) to improve the efficiency of I/O activity.rawfile which procview uses to create various report
Example
PROCSTAT FILE REPORT Showing All User Files That Moved Data (Sorted by File Name) (Process ID is: 94616) I/O Max File Bytes Avg I/O Rate Filename Mode Size Processed (Megabytes/Sec) ------------------------------------------------------- stderr WO 0 137 0.0 12 stdout WO 0 2649 0.012 =============================================== Total Files = 2 0 2786 0.012 %
cf77 -Wf"-ez" -l prof prog.f (Compiles program and loads the prof library)
setenv PROF_WPB=1 (Sets the bucket size to 1)
a.out
prof -x a.out > prof.raw (gathers data, taking information from a.out and prof.data)
profview prof.raw (reports information interactively)
OR
profview -LmhDc prof.raw > prof.report (reports to a file)
Notes regarding profview:
The -L flag indicates non-interactive use. Used interactively, as an X Windows session, profview gives a pie chart displaying time spent in each routine, plus a graph of "hits", showing where in memory the most frequently accessed instructions are.
profview requires recompilation and reloading, and uses extra memory. CPU overhead is negligible.
As profiling is reliant on the laws of probability, there may be cases where it is inaccurate. Used in conjunction with the other tools, particularly Flowtracing, however, it can often be a good early indicator of execution "hot-spots". Follow up on these by examining FPP listings of the routines shown by profiling to be expensive.
Note - a statement-level profile cannot be generated using profview, however, this feature is available on the VAX (PCA utility).
Example
PROFILE ENVIRONMENT REPORT
ORIGINAL USER EXECUTION
---------------------------------------
User Program Run on 09/22/94, at 10:18:33
Machine Serial Number 5227, a CRAY Y-MP,
with a Clock Speed of 30000 Picoseconds
CPU Seconds of Execution Time = 0.076868971
Calculated Actual Sample Rate = 143 microseconds
Original Executable Filename = a.out
Timestamp for this File: Thu Sep 22 10:18:16 1994
ORIGINAL PROF EXECUTION
---------------------------------------
Machine Type CRAY Y-MP
Starting Address Area Profiled 0 (octal)
Ending Address 215430
Bucket Size 1 (words)
Requested Sample Rate 512 (microseconds)
Total Hits in All Buckets =537
Estimated CPU Time Used (HITS * SAMPLE RATE) = 0.274944
Date/Time of prof Execution 09/22/94 10:18:50
Executable Filename Read During 'prof' Run = a.out
THIS PROFVIEW EXECUTION
---------------------------------------
profview Version = 70.6
Raw File Being Processed by profview Now = prof.raw
Percent Threshold Used for Summary = 10%
Total Percent Cutoff Used for Details = 95%
Only Active or Selected Symbols Displayed in Detail Report.
Only Library Names are Annotated in reports.
Number of Stars Printed That Represent 100% = 25.
PROFILE STATISTICS REPORT
Showing a Summary of Significant Items
List of Modules with More than 10% Activity
Module Name Hit Count PCT ACCUM%
--------------------------------------------------------
write (system I/O--write)
357 66.48 66.48 ****************
NOTES
Output from the summary option of prof allows one to focus on the important regions of the program quickly.
To determine which section of code is being represented, one looks at the SYMBOL NAME. This is the line contained in the region of code eg S.5000a indicates the code is near the beginning reference to statement label 5000. Long loops may be broken into many different sections eg L.122 is the actual source line number to look at corresponding with the line numbers found in the listing file.
Module % is the % time of the subroutine not the entire program.
Example pie chart produced by profview
Flowtracing is enabled as follows:
cf77 -F prog.f (invokes Flowtrace, creating the flow.data)
a.out
flowview -Luch > flow.report (non-interactive, sorted by inline factor and amount of CPU time)
The following points are relevant:
The -L flag indicates non-interactive use. Used interactively, as an X Windows session, flowview gives a pie chart displaying time spent in each routine, together with a very useful graph of the call tree.
Recompilation and reloading are needed, and Flowtracing incurs CPU overhead (depending on the frequency of routine entry and exits, this may be significant).
If subroutine A calls subroutine B, then time spent in B is properly attributed, unless B is a library routine, in which case it is included in A's timings.
An important statistic provided by Flowtracing is the in-line factor for each routine. This is calculated as the number of times a routine is called divided by the time spent in the routine. If this figure is greater than 1, the routine should be considered a candidate for code inlining (see later notes).
NOTE
Flowtracing is often one of the clearest indicators of program performance. The pie chart should tally approximately with that produced by profiling. For large or unfamiliar codes the call tree can be a very useful source of information.
Example
Flowtrace Environment Report
ORIGINAL USER EXECUTION
-----------------------
User Program Run on 09/22/94, at 13:48:52
Machine Serial Number 5227, a CRAY Y-MP,
with a Clock Speed of 30000 Picoseconds
Program was running in Y mode.
High-water mark for the stack was 4000 (octal words)
or 2048 (decimal words).
Original Executable Filename = a.out
Timestamp for this File: Thu Sep 22 13:48:47 1994
FLOWTRACE STATISTICS
--------------------
Hashing Efficiency = 100.00%, using 13 Hashing Points
To store 13 Routine Names.
Hash Points with.gt. 1 name = 0.
Flowtrace Heap Usage was 1750 (octal words)
or 1000 (decimal words).
The size of the heap was increased 1 time
as Flowtrace was executing.
Flowtrace overhead for:
------------------------
* ALL processing was 53899 (clocks)
or 1.62E-03 (seconds).
Clocks/call was 3170
seconds/call was 9.51E-05.
* ENTER processing was 34664 (clocks)
or 1.04E-03 (seconds).
* EXIT processing was 19235 (clocks)
or 5.77E-04 (seconds).
* First-Time ENTER processing was 100938 (clocks)
or 3.03E-03 (seconds).
THIS FLOWVIEW EXECUTION ----------------------- Flowview version = 70.14 Raw File Being Processed by flowview Now = flow.data Flowtrace Statistics Report Showing Routines Sorted by CPU Time (Descending) (CPU Times are Shown in Seconds) Routine Name Tot Time # Calls Avg Time Percentage Accum% ---------------- -------- -------- -------- ---------- ------- WT 7.00E-02 3 2.33E-02 99.06 99.06******* EXERC6 2.66E-04 1 2.66E-04 0.38 99.44 GENARRAY 9.26E-05 3 3.09E-05 0.13 99.57 SGI 5.53E-05 1 5.53E-05 0.08 99.65 SLD 5.39E-05 1 5.39E-05 0.08 99.72 PLI 4.31E-05 1 4.31E-05 0.06 99.79 PGD 4.21E-05 1 4.21E-05 0.06 99.85 SHORT_VL 3.02E-05 1 3.02E-05 0.04 99.89 PGI 2.49E-05 1 2.49E-05 0.04 99.92 PLD 2.17E-05 1 2.17E-05 0.03 99.95 SLI 2.09E-05 1 2.09E-05 0.03 99.98 SGD 1.17E-05 1 1.17E-05 0.02 100.00 ======================================================== Totals 7.06E-02 16
Example pie chart produced by flowview