[Next] [Previous] [Top]
"Default Buffer Sizes."
Using the -b flag, it is possible to alter buffer sizes to suit your particular needs. For long, sequential reads or writes, a large buffer is in order, if file accesses are short and effectively randomly distributed within the program, it is advisable to reduce the buffer size. The -n flag may be used to set aside system file space for your file, allowing the user to choose the length of stride. Check the effectiveness of these changes with procview.
If procview or job accounting have highlighted inefficient memory transactions, it may be worthwhile checking the code for array accesses in strides of 2k, where k >5 (remember that in Fortran the left hand index changes most rapidly). If this can be altered, then do so.
User-supplied directives may be to fpp, fmp or cft77, and are as follows (starting in column 1):
CFPP$ DIRECTIVE [SCOPE]
CMIC$ DIRECTIVE
CDIR$ DIRECTIVE
The scope of an fpp directive may be L (loop), R (routine) or F (file). Directives beginning with CMIC@ or CDIR@ are generated by fpp and fmp, and should not be inserted by the user.
Some of the more popular and useful directives are discussed below. For further information on; compiler directives see CF77 Compiling System, Vol 1: Fortran Reference Manual, vectorization directives see Vol 3: Vectorization Guide and tasking directive see Vol 4: Parallel Processing Guide.
DO 10 I = 1, 10
A(I) = A(I+J)
If we happen to know that J will be greater than 10 at runtime, i.e., there is no dependency, we may insert the `CFPP$ NODEPCHK' directive, and the loop will vectorize. The cft77 directive `CDIR$ IVDEP' has the same effect.
The `CFPP$ EXPAND subroutine' directive inlines all occurrences of subroutine into current routine. The default is no automatic inlining.
Similarly, the `NEXPAND' directive performs nested expansion, inlining routines called in inlined routines. Bear in mind that fpp may refuse to honour an `NEXPAND' directive if the nesting level is too deep.
The `AUTOEXPAND' directive enables automatic routine inlining, with loop, routine or file scope. Only leaves of the call tree are inlined, and then only provided the routine is smaller than some internal limit. This directive with file scope is equivalent to specifying `cf77 -Wd"-e68" ' on the command line.
The associated `SEARCH (files)' directive is used to tell cf77 where to look for a routine to inline, if it is not in the same file.
The `CFPP$ UNROLL (n)' directive forces these loops to unroll. If used with local scope (default), the optional parameter n is the number of times to unroll that particular loop. If used with routine or file scope, n is the maximum iteration count (default 3) of loops to be unrolled completely.
Look first at loops which the fpp listing "almost" vectorized - these are likely to be the areas where some progress is possible.
Are there loops inhibited from vectorization which could be split into vectorizable and nonvectorizable sections? (These are usually spotted by fpp).
Is it possible to dispense with any I/O statements which prevent vectorization or autotasking?
If there are any routines which could be replaced by library routines, it is usually best to do so (again, fpp catches most level 1 and level 2 BLAS). Optimized NAG routines are available, link your code to the library as follows
cf77 -Wl"-l /usr/local/lib/libnag.a" prog.f
For information on specific routines, type naghelp. Remember that on the Cray, single precision = double precision, so we use single precision NAG routines. Thus if the routine name is listed in naghelp as XXXXXF, use XXXXXE.It is sometimes fruitful to use a dummy array to calculate partial results in vector mode, then use these results in a final scalar loop.