The Queen's University of Belfast

Parallel Computer Centre

[Next] [Previous] [Top]

CRAY Y-MP EL

Vector Processing and Performance Utilities

Student Notes

Ruth Dilly

Parallel Computer Centre, Belfast

CRAY Y-MP EL

Vector Processing and Performance Utilities

OHP MATERIAL

Parallel Computer Centre

Queens University Belfast

1 - Introduction

1.1 - Hardware
1.2 - Software
1.3 - Documentation
1.4 - Obtaining an account
1.5 - Logging on
1.6 - X-windows
1.7 - On-line information
1.7.1 - Manual pages
1.7.2 - Document viewer
1.7.3 - Computer based training
1.7.4 - explain & whatis
1.8 - Running programs
1.8.1 - Scratch disk - temporary file space
1.8.2 - The batch environment
1.8.3 - Using the NAG libraries
1.8.4 - Reducing compilation time - makefiles
1.8.5 - File compression utility

2 - Vectorization

2.1 - Vector Hardware
2.1.1 - Pipelining
2.1.2 - Chaining
2.2 - Comparison - vector and scalar operation
2.2.1 - Scalar and vector processing examples
2.3 - Vector Performance
2.4 - General Requirements for vectorization
2.5 - Conditions that inhibit vectorization
2.5.1 - Examples of vectorization inhibitors
2.6 - Memory contention
2.7 - Memory optimization
2.8 - Amdahl's Law for vectorization
2.9 - Terminology

3 - Dependencies

3.1 - Testing for dependency
3.2 - Rigorous testing for dependency
3.3 - Vectorizing recurrences
3.4 - Data dependency directives

4.1 - IF loops and search loops
4.2 - Branches
4.3 - Transformations
4.4 - Special case - loops
4.5 - Loop nest restructuring
4.6 - Loop optimizations
4.7 - Other common optimization techniques
4.8 - Compiler vectorization directives

5 - Parallel Processing

5.1 - Capabilities
5.2 - Evolution of CRI parallel processing software
5.3 - Autotasking
5.3.1 - Goals of autotasking
5.3.2 - When to use autotasking
5.3.3 - Speedup
5.4 - fpp
5.5 - fmp
5.6 - fpp loop selection criteria for autotasking
5.6.1 - fpp loop optimization techniques
5.6.2 - Additional fpp optimization
5.7 - Autotasking performance
5.7.1 - Estimating percentage parallelism within a program
5.8 - Prerequisites for high performance
5.8.1 - Parallelism and load balancing
5.8.2 - Overhead
5.8.3 - Autotasking analysis tools
5.8.4 - Memory usage
5.9 - Strategy for debugging autotasked code
5.10 - Multitasking terminology

6 - Performance Utilities

6.1 - cf77 Compilation System Phases
6.1.1 - Commands
6.2 - Compiler features
6.3 - Compiler directives
6.4 - Loop unwinding and unrolling

7 - Optimization Strategies

7.1 - Preliminary considerations
7.1.1 - Step 0 Define a baseline
7.1.2 - Step 1 Determine resource target for optimization
7.1.3 - Step 2 Target subroutines for optimization
7.1.4 - Step 3 Determine target loops for vectorization
7.1.5 - Step 4 Recall vector inhibitors
7.1.6 - Step 5 Other Optimization Techniques
7.1.7 - Step 6 Other Information
7.2 - Optimising Fortran Programs - Code Analysis
7.3 - Types of optimization tools

8 - Optimization Techniques

8.1 - Job accounting
8.2 - Procview
8.3 - Profview
8.4 - Flowview

9 - The arcane world of listing

9.1 - fpp
9.2 - cft77
9.3 - xbrowse (incorporating atscope)
9.4 - ftref
9.5 - Jumpview

10 - Autotasking analysis

10.1 - atexpert
10.2 - mtdump

11 - Optimizing Fortran Programs - Code Optimization

11.1 - I/O and memory enhancements
11.2 - Improvement by directive
11.3 - Unconditional vectorization
11.4 - Code inlining
11.5 - Loop unrolling
11.6 - Short vector loops
11.7 - Autotasking and microtasking directives
11.8 - Other directives
11.9 - Changing the code

[Next] [Previous] [Top]

All documents are the responsibility of, and copyright, © their authors and do not represent the views of The Parallel Computer Centre, nor of The Queen's University of Belfast.
Maintained by Alan Rea, email A.Rea@qub.ac.uk

Generated with CERN WebMaker