CRAY Y-MP EL Student Notes Performance Utilities, QUB

The Queen's University of Belfast

Parallel Computer Centre

[Next] [Previous] [Top]

6 Performance Utilities

`An Overview of the cf77 compiling system'

6.1 cf77 Compilation System Phases

FPP dependence analyser ie analyses and transforms Fortran source code to maximise use of Cray hardware ie particularly enhances vectorization and automatically detects and exploits parallelism in the code. Parallelism is recognised by CMIC$ compiler directive.

FMP autotasking translation phase, ie invokes Fortran multitasking translator to translate Autotasking directives for input to cft77 compiler.

CFT77 actual compiler ie compiles Fortran source code into relocatable object code which can loaded by segldr into an executable file.

SEGLDR segment loader which converts relocatable object code into an executable binary file.

6.1.1 Commands

syntax: - cf77 [options] sourcefile.f

options - specify the invocation of one or more components

cf77 -Z invokes the preprocessor and the mid processor in the following ways:

-Zp - invokes both fpp and fmp

-Zc - bypasses both fpp and fmp (this is the default option)

-Zv - all phases except fmp

-Zu - all phases except fpp

Example sequence of events invoked by the following command:

cf77 -Zp prog.f

fpp prog.f > prog.m

fmp prog.m > prog.j

cft77 prog.j

rm prog.m prog.j

segldr prog.o

rm prog.o

Whole system cf77 command

cf77 -Zp -Wd"fpp options" -Wu"fmp options" -Wf"cft77 options" prog.f

cf77 -Zp -o prog compiling system options

-Wd"-ei" dependency analyser options

-Wu"-x" translator options

-Wf"-em" compiler options

-Wl"-i dirs" loader options

file.f file1.f input filenames

Files within the compiling system

*.f = source

*.l = listing

*.s = CAL ie Cray Assembly Language

*.o = compiled code

*.m = code containing Autotasking/microtasking directives

*.j = Fortran code previously processed by fmp

*.a = library files

*.F = code to be processed by Generic preprocessor GPP

FPP and FMP are two phases which can significantly improve a programs performance. CFT recognises many parallel constructs automatically and compiles them for multitasking without requiring additional user input ie autotasking. Autotasking cannot do any harm in that it merely decides if the program may be broken down into separate tasks.

6.2 Compiler features

INLINING - Inline code expansion

This operates in two modes either explicit or automatic. When a program unit calls a subprogram the called subprogram is incorporated (within the resulting binary program) into the calling program unit where the call occurred. This eliminates the calling overhead and allows vectorization of loops which would otherwise be prevented from vectorizing due to the external call. A problem with inlining is that it may cause an unacceptable increase in the size of the program.

CROSS-COMPILING

This is compiling a program on one system to execute on another. The target system is specified by cpu and hdw arguments either on the complier line (using -C) or directly to the operating system (using the TARGET environment variable).

6.3 Compiler directives

Compiler directives are lines, within the source code, that specify actions to be performed by the compiler, they are not Fortran code but they are listed in the source statement listing.

The various categories of compiler directives are, vectorization control, scalar optimization control, listable output control, localised use of features specified by command line options and storage specifications.

FPP and FMP generate directives used by the compiler beginning with CDIR@ whereas the user may use CDIR$, in columns 1 to 5, leaving 6 blank and then columns 7 to 72 are used for the directives.

A directive in the source program applies only to the program unit in which it appears whereas it may also be specified on the command line and thus apply to the entire compilation. Generally the CDIR$ directives for features override the command-line option for the same feature.

6.4 Loop unwinding and unrolling

Both of these optimization techniques are performed automatically by the compiler and can reduce loop overhead for eligible loops.

Unwinding makes several copies of the body of a loop resulting in straight code which is then vectorizable.

Unrolling creates a new version of the loop at the scheduling level not in the source but does not remove the original. The new copy consists of n copies of the loops computations ie n iterations. This reduces loop overhead and improves scheduling and register assignment in the new unrolled loop.

[Next] [Previous] [Top]

All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk

Generated with CERN WebMaker