The Queen's University of Belfast
Parallel Computer Centre

[Next] [Previous] [Top]

7 Optimization Strategies


7.1 Preliminary considerations

is optimization worthwhile/needed

does the program use significant resources and which ones

is the program destined for frequent long-term use

how can one get the maximum return (execution speedup) on investment (your time)

learning experience to apply to development phase of future programming

7.1.1 Step 0 Define a baseline

Have to determine which routines and what type of optimization is necessary - not always true that a highly vectorized program is running as fast as it could be ie there may be memory conflicts and I/O problems slowing it down.

gather execution performance statistics for properly executing code ie job accounting and procstat utilities give information which should be retained and compared with the results obtained after optimization efforts

maintain separately a working version of the program

do not begin optimization until properly executing code has been obtained

operate on a computationally short model which is representative of the whole

7.1.2 Step 1 Determine resource target for optimization

use job accounting, ja, and procstat

determine if the program is I/O bound

determine if the program is vectorized

determine if the program efficiently utilises memory

compare loopmark listings - run preprocessor to determine additional speedup eg

cf77 -Zv -Wf"-em" *.f

verify the results produced by fpp

7.1.3 Step 2 Target subroutines for optimization

loopmark, prof, perftrace

check loopmark for vectorization information

vectorization is often the single most effective method of reducing overall program execution time

7.1.4 Step 3 Determine target loops for vectorization

run prof

look for high percentage loops or regions

look for high percentage regions which are not included in a vectorized inner loop

look for high percentage regions which are vectorized but include complex index calculations and/or divisions

look for high percentage regions which are vectorized, but include IF statements, and/or GOTO branches

NOTE

May have to restructure code so that significant code takes place in the inner most loop not outside of the loop. Complex index calculations can cause problems. Loops containing IF statements will always slow a routine down.

7.1.5 Step 4 Recall vector inhibitors

I/O statements

CALL, RETURN, STOP, or PAUSE statements

function references

branching statements (IF, GOTO)

data dependencies and recurrences

ambiguous subscript references

long loops - could use aggress option of Wf but have to verify results eg

cf77 -Wf"-o aggress" *.f

7.1.6 Step 5 Other Optimization Techniques

unroll small loops

inline often called subroutines

promote scalars to vectors

use parameter statements to define constants especially those which define loop lengths

switch loop dimensions when appropriate

7.1.7 Step 6 Other Information

regularly run the performance tools since an optimized routine can cause another to become important

perform optimization comparisons on a short run

7.2 Optimising Fortran Programs - Code Analysis

This process may be summarised as in the following diagram:

7.3 Types of optimization tools

Static Analysis Tools - give compile time information

cross reference listings (cf77 -Wf"-ex" prog.f = creates the prog.l file)

program listings and utilities - Loopmark listing and ftref

Dynamic Analysis Tools - give run-time information

ja - job accounting

flowtrace - subroutine level analysis of the program giving the dynamic calling tree

prof and profview - use a fixed time interval for sampling so resulting statistics involve probability

procstat and procrpt

CRAY TOOLS SUMMARY


[Next] [Previous] [Top]
All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk
Generated with CERN WebMaker