The Queen's University of Belfast
Parallel Computer Centre

[Next] [Previous] [Top]

9 The arcane world of listing


9.1 fpp

The dependency analysis phase of the cf77 system does a significant amount of code restructuring, as well as inserting directives for autotasking and vectorization. It also produces a listing, detailing what it has done, and why it failed to do more. Generate this listing by:

cf77 -Zp -Wd"-l listfile" prog.f OR more simply fpp -l listfile prog.f

This will give a routine-by-routine breakdown of what fpp found and what action it took. Each loop is marked according to whether it is vectorizable, parallelisable, or inhibited from optimization. The routine is then shown in translated form, sometimes showing code restructuring, routine inlining or loop unrolling, and a loop summary printed showing how many loops were optimized, and an indication of the problems remaining. These problems can then be tackled with directives to the compiler, or command-line options, see later notes.

9.2 cft77

The listing produced by cft77 is in some ways similar to that produced by fpp, with the omission of any indication of autotasking. It also tends to be longer, sometimes unmanageably so. Use

cf77 -Wf"-e m" prog.f OR cft77 -e m prog.f

This gives a what is called a "Loopmark" listing of the code, similar to fpp's discussed above. More options may be enabled, such as a cross reference listing, but this makes the listing file much larger, with little more useful information.

9.3 xbrowse (incorporating atscope)

This is a FORTRAN-specific editor and debugging aid. It can be useful in the optimization process, because it features the atscope utility. Atscope helps the user decide if variables are of private or shared scope in a parallel region, and inserts autotasking directives for you. Useful for those who wish to improve on fpp's efforts at autotasking. Use

xbrowse &

9.4 ftref

Generates a static call tree showing which routines call which and other cross-referencing information from a listing file (ie before execution). Use following commands:

cf77 -Wf"-esx" -c progname.f

ftref -c full -tfull progname.l > progname.xref

9.5 Jumpview

An important feature, as this is the only way to obtain megaflops ratings on the Y-MP EL (on other Y-MP machines hpm and perftrace provide these statistics). Allows you to determine the exact timing of every executed code block within your program. Use

cf77 -Wf"-ez" -ltrace prog.f (have to enable with the libtrace library)

jt a.out

jumpview -Lumch > jump.report (non interactive version)

The -L flag indicates a non-interactive report, use jumpview by itself for an interactive X Windows session.

Jumptracing requires recompilation and reloading. It also incurs significant CPU overhead.

Although the UNICOS Performance Utilities Reference Manual claims that Jumptracing works with multitasked codes, the author has found the opposite, ie, the NCPUS environment variable should be set to 1.

Jumptracing timings are exact and reproducible, in contrast to the operating system timings in Flowtracing and probabilistic timings in Profiling.

Unlike flowview, jumpview gives times for library routines.

In practice, jumptracing helps us to see the ratio of vector to scalar operations in each subroutine. Combined with the megaflops rating, this gives what is probably the clearest indicator of vector performance.

Example

  JUMPTRACE DATA REPORT
  Showing Routines Sorted by Total CPU Time
  (CPU Times are Shown in Seconds)
 
 Name     Called  Time     Avg Time EX %  ACM % Mflops Mmems
 -------- ------- -------- -------- ----- ----- -----  -----
 $WFV$%   303     1.56E-02 5.13E-05 20.1  20.1  0.0    2.0 *****
 NOCV%    306     8.94E-03 2.92E-05 11.5  31.6  0.0    1.7 **
 fwrite   316     8.89E-03 2.81E-05 11.5  43.1  0.8    2.4 **
 $WFI     303     6.35E-03 2.10E-05 8.2   51.3  0.0    5.7 **
 f_wch    309     6.16E-03 1.99E-05 8.0   59.2  0.0    3.3 *
 $gtdsp   306     4.73E-03 1.54E-05 6.1   65.3  0.0    2.3 *
 _xflsbuf 331     4.11E-03 1.24E-05 5.3   70.6  0.6    2.3 *
 __flsbu  309     3.58E-03 1.16E-05 4.6   75.2  0.7    2.2 *
 $pack    306     2.61E-03 8.52E-06 3.4   78.6  0.0    3.3 
 $FFS     303     2.11E-03 6.96E-06 2.7   81.3  0.0    1.3 
 $WFF     303     1.91E-03 6.30E-06 2.5   83.8  0.0    3.5 
 EDMSS%   274     1.77E-03 6.46E-06 2.3   86.0  1.7    0.0 
 write    331     1.59E-03 4.80E-06 2.0   88.1  0.0    1.7 

etc


[Next] [Previous] [Top]
All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk
Generated with CERN WebMaker