Both must satisfy the following requirements in order to be vectorized:
the loop must be executed the correct number of times - an early exit is any exit from the body of the loop that decreases the number of loop iterations from the trip count calculated for the loop
the correct exit must be taken
all scalar values must be correct when an exit is taken
Once the requirements are met the loops are vectorized as follows the IF expression is evaluated for the full set of loop iterations indicated by the loops DO statement; the point where the IF expression is satisfied indicates the vector length to be used for the other expressions in the loop. This length is used when those expressions are executed.
IF statements are vectorized as follows, set vector mask based on conditional result, if no bits in the vector mask are set that block of work is skipped, elements corresponding to true conditions are gathered, the result computed, and the result scattered back into memory.
Changes may be made to the code so that it complies with the requirements for vectorization. Some of the following should be adhered to in order to facilitate vectorization:
Code motion ie moving early exits - the compiler can move a loop's final conditional exit to the top of the block containing the exit thus the exit precedes the block's vectorizable statements so that the number of iterations are known before the vectorizable statements are executed.
Values computed following exit - condition for an early exit must not depend on values computed (ie in a previous iteration) in the portion of the loop following the exit.
Indirect addressing - this inhibits vectorization when the exit condition involves indirect addressing as this may lead to range errors.
branches into a loop from outside prohibit vectorization
loops containing backward branches cannot be vectorized since a backward branch is itself a loop
loops with forward branches permit vectorization
arithmetic IFs which are obsolete are not vectorizable since they have multiple destinations
converts IF loops to DO loops when they have a single entrance and a single exit
the compiler can analyse any combination of conditional assignments, conditional and unconditional forward branching, and block IFs. Compilation-speed restrictions mean there is a limit of 6 simultaneously active conditions
CF77 uses the CVMGT function to convert conditional reductions into a vectorizable form, if the vectorizable expression being reduced contains only +, -, *, count and logical operators.
Reduction Loops - reduces an array to a scalar value by doing a cumulative operation on all of the array's elements, this involves including the result of the previous iteration in the expression of the current iteration
Implied DO Loops - used to read or write the code, makes a call to a special entry point in the I/O library that vectorizes portions of the I/O operation. Although the I/O statement does not vectorize the implied DO loop within the statement does.
Array Reference
with constant index - an array reference with a constant index should be the equivalent of a scalar reference and treated as such but is not as it may cause confusion in dependence analysis
DO I=1, N !N>>10
DO J=1,10
A(I,J) = B(I,J) * C(I,J)
END DO
END DO
Restructured or reordered so that the longest vectorizable loop is the innermost loop
DO J=1,10
DO I=1, N
A(I,J) = B(I,J) * C(I,J)
END DO
END DO
CF77 selects the best loop for vectorization according to the following criteria:-
loop iteration count
stride size of array references
percentage of data dependent code
percentage of conditionally executed code
Loop collapse - converts a nest of loops into a single loop with a large iteration count
Loop fusion - combines consecutive loops with no statements between them
Example
DO I=1,N
A(I) = B(I) + C(I)
END DO
DO I=2, N+1
D(I) = E(I) **2
END DO
Combined into
DO I=1,N
A(I) = B(I) + C(I)
D(I+1) = E(I+1) **2
END DO
Source-level loop unrolling - makes a copy of the loop body for every iteration to be executed straight-line eg
DO I =1, N
DO J=1, 4
A(I,J) = B(I,J) * C(I,J)
END DO
END DO
Unrolled - vertically
DO I =1, N
A(I,1) = B(I,1) * C(I,1)
A(I,2) = B(I,2) * C(I,2)
A(I,3) = B(I,3) * C(I,3)
A(I,4) = B(I,4) * C(I,4)
END DO
Example
DO I =1, N
T = 0
DO J=1, 4
T = T + B(I,J)
END DO
A(I) = A(I) + T
END DO
Unrolled horizontally
DO I =1, N
A(I) = A(I) + B(I,1)+B(I,2)+B(I,3)+B(I,4)
END DO
Translation of array notation - it is translated into DO loops which can then be vectorized
VFUNCTION directive use - this directive is recognised and is treated as vector intrinsic
Loop splitting - split non-vectorizing loops into loops containing vectorizable and non-vectorizable statements as in the next example:
DO I=2, N
A(I) = A(I-1) * D(I)
B(I) = A(I) + C(I-1) **2
END DO
Loop split version
DO I=2, N
A(I) = A(I-1) * D(I)
END DO
DO I=2, N
B(I) = A(I) + C(I-1) **2
END DO
Promoting scalar to vector - scalar recurrences may be avoided using a temporary vector eg
DO J=1, M
S = BB
DO I=1, N
S = S*C
A(I) = A(I) + S
END DO
END DO
Modified
DO I=1, M
TV(I) = BB
END DO
DO I=1, N
DO J=1, M
TV(J) = TV(J) + C
A(I) = A(I) + TV(J)
END DO
END DO
Using the PARAMETER statement - can improve compiler optimization by providing more compile time information about loop lengths and potential data dependencies
Example
DO I=2, N
A(I-1) = A(I-S) * B(I) !CONDITIONAL VECTOR LOOP
END DO
Modified
PARAMETER (S=1)
...
DO I=2, N
A(I-1) = A(I-S) * B(I) !VECTOR LOOP
END DO
IF block ordering - placing the most frequently executed conditions first in block IFs
Example
DO I=1, N
IF (A(I) .EQ. 0) THEN
B(I) = AZERO !RARE
ELSEIF (A(I) .LT. 0) THEN
B(I) = ANEG !RARE
ELSE
B(I) = C(I) * D(I)/A(I)
ENDIF
END DO
Modified
DO I=1, N
IF (A(I) .GT. 0) THEN
B(I) = C(I) * D(I)/A(I) ! MOST FREQUENT
ELSEIF (A(I) .EQ. 0) THEN
B(I) = AZERO
ELSE
B(I) = ANEG
ENDIF
END DO
Subroutine in-lining - subroutine and function calls inside a loop prevent vectorization. The loop can be vectorized by bringing the code from the subroutine or function in-line. This can be done manually, using the compiler, or for functions by using a statement function.
FPP user directives have the following syntax:
CFPP$ directive scope
Scope has the range of values - R for routinue, L for loop, F for file, I for immediate and if left blank applies to the next loop.
Some of the FPP directives are eg CFPP$:
VECTOR/NOVECTOR Turns vectorization on or off
CONCUR/NOCONCUR Disables/enables autotasking
NODEPCHK Ignores potential data dependencies
NEXTSCALAR Disables vectorization for the next DO loop between the directive and the end of the program unit
Other source code directives are eg CDIR$:
IVDEP (SAFEVL = n) Indicates that any dependencies can be ignored if the vector length does not exceed n - only used when it is known that any apparent dependencies will not cause invalid results if a loop is vectorized
VFUNCTION Declares that a vector version of an external function exists
SHORTLOOP Allows the compiler to generate faster code when a loop's trip count is 64 or less
VSEARCH/NOVSEARCH Default is VSEARCH as this enables vectorization of all search loops until a NOVSEARCH is encountered
RECURRENCE/NORECURRENCE
Enables or disables vectorization of all reduction loops again until a NORECURRENCE is encountered or the end of the program unit