CRAY Y-MP EL Student Notes Optimization Techniques, QUB

8 Optimization Techniques

8.1 Job accounting

This UNICOS feature allows the user to see roughly what is going on during code execution. Job accounting should be used with every run, even production runs, as it incurs no CPU overhead and gives a great deal of useful information. This includes a timestamp of the beginning and end of runtime, as well as total elapsed time, CPU time used, and other information such as time spent in I/O operations and memory accesses. If the environment variable, NCPUS, is greater than 1 (default 3, or 4 by early 1994) and autotasking is turned on (-Zp flag to cf77), ja reports on the tasking breakdown, showing how many CPUs were used and for how long.

Use the following:

a.out

ja -clst > acctfile Options - cl = command summary, s = summary,t = terminate

On the basis of information gathered from this report, one may decide which area of the program to concentrate on first. If I/O or memory activity is high, the next step should be to investigate this further with procview. If the code is CPU-bound, it's probably best to move on to profview and flowview, to further examine CPU activity.

Example

Job Accounting - Summary Report

===============================

Job Accounting File Name : /tmp/nqs.+++++03av/.jacct4001

Operating System : sn5227 sn5227 7.0.4.2 roo.5 CRAY Y-MP

User Name (ID) : ccg0002 (1474)

Group Name (ID) : cc (901)

Account Name (ID) : centre (13)

Job Name (ID) : scrip (4001)

Report Starts : 10/11/94 13:40:11

Report Ends : 10/11/94 13:40:18

Elapsed Time : 7 Seconds

User CPU Time : 12.4072 [ 12.4067] Seconds

Multitasking Breakdown

(Concurrent CPUs * Connect seconds = CPU seconds)

--------------- --------------- -----------

1 * 0.1625 = 0.1625

2 * 1.9370 = 3.8740

3 * 2.4148 = 7.2445

(Concurrent CPUs * Connect seconds = CPU seconds)

(Avg.) (total) (total)

-------------------------------------------

2.50 * 4.5143 = 11.2809

System CPU Time : 0.7885 Seconds

I/O Wait Time (Locked) : 1.1013 Seconds

I/O Wait Time (Unlocked) : 0.2744 Seconds

CPU Time Memory Integral : 0.8239 Mword-seconds

SDS Time Memory Integral : 0.0000 Mword-seconds

I/O Wait Time Memory Integral : 0.1312 Mword-seconds

Data Transferred : 0.2458 MWords

Maximum memory used : 0.1406 MWords

Logical I/O Requests : 122

Physical I/O Requests : 97

Number of Commands : 2

Billing Units : 0.0000

8.2 Procview

The procstat/procview combination of commands are used to monitor I/O and memory activity. Procstat produces information about process activity working with standard UNICOS libraries ie statistics about I/O activity, process activity and memory activity. Procview processes the raw-format data generated by procstat. Use the following UNICOS commands:

setenv NCPUS 1

procstat -R proc.raw a.out

procview -L -Sn proc.raw > proc.report

This produces a report in the file proc.report.

Options:

-R = writes rawfile for procview to create reports

-L = prevents execution of interactive X Windows

-Sn = generates one or more system report files sorted by name

Note:

autotasking is not allowed (NCPUS=1).

Recompilation and reloading are not necessary

There is very little CPU overhead

The procview report shows, among other things, how many bytes were read and written in each file, and the rate at which they were processed. With this knowledge, we can use the assign statement (explained later) to improve the efficiency of I/O activity.rawfile which procview uses to create various report

Example

  PROCSTAT FILE REPORT
  Showing All User Files That Moved Data
  (Sorted by File Name)
 
  (Process ID is: 94616)

 
 
   I/O  Max File   Bytes   Avg I/O Rate
 Filename Mode  Size   Processed   (Megabytes/Sec)
 -------------------------------------------------------
 stderr  WO   0   137     0.0               12
 stdout  WO   0   2649        0.012
 
 ===============================================
 Total Files = 2   0   2786      0.012
 %

8.3 Profview

This utility is a statistical analysis of program execution. The program address register is sampled at regular intervals, and a system routine records the address of the instruction currently being executed. Usage is as follows:

cf77 -Wf"-ez" -l prof prog.f (Compiles program and loads the prof library)

setenv PROF_WPB=1 (Sets the bucket size to 1)

a.out

prof -x a.out > prof.raw (gathers data, taking information from a.out and prof.data)

profview prof.raw (reports information interactively)

profview -LmhDc prof.raw > prof.report (reports to a file)

Notes regarding profview:

The -L flag indicates non-interactive use. Used interactively, as an X Windows session, profview gives a pie chart displaying time spent in each routine, plus a graph of "hits", showing where in memory the most frequently accessed instructions are.

profview requires recompilation and reloading, and uses extra memory. CPU overhead is negligible.

As profiling is reliant on the laws of probability, there may be cases where it is inaccurate. Used in conjunction with the other tools, particularly Flowtracing, however, it can often be a good early indicator of execution "hot-spots". Follow up on these by examining FPP listings of the routines shown by profiling to be expensive.

Note - a statement-level profile cannot be generated using profview, however, this feature is available on the VAX (PCA utility).

Example

PROFILE ENVIRONMENT REPORT

ORIGINAL USER EXECUTION

---------------------------------------

User Program Run on 09/22/94, at 10:18:33

Machine Serial Number 5227, a CRAY Y-MP,

with a Clock Speed of 30000 Picoseconds

CPU Seconds of Execution Time = 0.076868971

Calculated Actual Sample Rate = 143 microseconds

Original Executable Filename = a.out

Timestamp for this File: Thu Sep 22 10:18:16 1994

ORIGINAL PROF EXECUTION

---------------------------------------

Machine Type CRAY Y-MP

Starting Address Area Profiled 0 (octal)

Ending Address 215430

Bucket Size 1 (words)

Requested Sample Rate 512 (microseconds)

Total Hits in All Buckets =537

Estimated CPU Time Used (HITS * SAMPLE RATE) = 0.274944

Date/Time of prof Execution 09/22/94 10:18:50

Executable Filename Read During 'prof' Run = a.out

THIS PROFVIEW EXECUTION

---------------------------------------

profview Version = 70.6

Raw File Being Processed by profview Now = prof.raw

Percent Threshold Used for Summary = 10%

Total Percent Cutoff Used for Details = 95%

Only Active or Selected Symbols Displayed in Detail Report.

Only Library Names are Annotated in reports.

Number of Stars Printed That Represent 100% = 25.

PROFILE STATISTICS REPORT

Showing a Summary of Significant Items

List of Modules with More than 10% Activity

Module Name Hit Count PCT ACCUM%

--------------------------------------------------------

write (system I/O--write)

357 66.48 66.48 ****************

NOTES

Output from the summary option of prof allows one to focus on the important regions of the program quickly.

To determine which section of code is being represented, one looks at the SYMBOL NAME. This is the line contained in the region of code eg S.5000a indicates the code is near the beginning reference to statement label 5000. Long loops may be broken into many different sections eg L.122 is the actual source line number to look at corresponding with the line numbers found in the listing file.

Module % is the % time of the subroutine not the entire program.

Example pie chart produced by profview

8.4 Flowview

This tool keeps track of subroutine activity by logging routine entry and exit times, Flowtrace shows how much time is spent in each routine, the number of times each routine was called, and the caller/callee relationships ie it creates a dynamic calling tree eg

Flowtracing is enabled as follows:

cf77 -F prog.f (invokes Flowtrace, creating the flow.data)

a.out

flowview -Luch > flow.report (non-interactive, sorted by inline factor and amount of CPU time)

The following points are relevant:

The -L flag indicates non-interactive use. Used interactively, as an X Windows session, flowview gives a pie chart displaying time spent in each routine, together with a very useful graph of the call tree.

Recompilation and reloading are needed, and Flowtracing incurs CPU overhead (depending on the frequency of routine entry and exits, this may be significant).

If subroutine A calls subroutine B, then time spent in B is properly attributed, unless B is a library routine, in which case it is included in A's timings.

An important statistic provided by Flowtracing is the in-line factor for each routine. This is calculated as the number of times a routine is called divided by the time spent in the routine. If this figure is greater than 1, the routine should be considered a candidate for code inlining (see later notes).

NOTE

Flowtracing is often one of the clearest indicators of program performance. The pie chart should tally approximately with that produced by profiling. For large or unfamiliar codes the call tree can be a very useful source of information.

Example

Flowtrace Environment Report

ORIGINAL USER EXECUTION

-----------------------

User Program Run on 09/22/94, at 13:48:52

Machine Serial Number 5227, a CRAY Y-MP,

with a Clock Speed of 30000 Picoseconds

Program was running in Y mode.

High-water mark for the stack was 4000 (octal words)

or 2048 (decimal words).

Original Executable Filename = a.out

Timestamp for this File: Thu Sep 22 13:48:47 1994

FLOWTRACE STATISTICS

--------------------

Hashing Efficiency = 100.00%, using 13 Hashing Points

To store 13 Routine Names.

Hash Points with.gt. 1 name = 0.

Flowtrace Heap Usage was 1750 (octal words)

or 1000 (decimal words).

The size of the heap was increased 1 time

as Flowtrace was executing.

Flowtrace overhead for:

------------------------

* ALL processing was 53899 (clocks)

or 1.62E-03 (seconds).

Clocks/call was 3170

seconds/call was 9.51E-05.

* ENTER processing was 34664 (clocks)

or 1.04E-03 (seconds).

* EXIT processing was 19235 (clocks)

or 5.77E-04 (seconds).

* First-Time ENTER processing was 100938 (clocks)

or 3.03E-03 (seconds).

  THIS FLOWVIEW EXECUTION
  -----------------------
 Flowview version = 70.14
 Raw File Being Processed by flowview Now = flow.data
 
  Flowtrace Statistics Report
  Showing Routines Sorted by CPU Time (Descending)
  (CPU Times are Shown in Seconds)
 
 Routine Name      Tot Time  # Calls   Avg Time  Percentage   Accum%
 ----------------  --------  --------  --------  ----------  -------
 WT                7.00E-02  3         2.33E-02  99.06       99.06*******
 EXERC6            2.66E-04  1         2.66E-04  0.38        99.44 
 GENARRAY          9.26E-05  3         3.09E-05  0.13        99.57 
 SGI               5.53E-05  1         5.53E-05  0.08        99.65 
 SLD               5.39E-05  1         5.39E-05  0.08        99.72 
 PLI               4.31E-05  1         4.31E-05  0.06        99.79 
 PGD               4.21E-05  1         4.21E-05  0.06        99.85 
 SHORT_VL          3.02E-05  1         3.02E-05  0.04        99.89 
 PGI               2.49E-05  1         2.49E-05  0.04        99.92 
 PLD               2.17E-05  1         2.17E-05  0.03        99.95 
 SLI               2.09E-05  1         2.09E-05  0.03        99.98 
 SGD               1.17E-05  1         1.17E-05  0.02        100.00 
 ========================================================
 Totals  7.06E-02  16

Example pie chart produced by flowview

[Next] [Previous] [Top]

All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk

Generated with CERN WebMaker