Purpose
The ASSERT directive provides the compiler with characteristics of DO loops that can assist in optimizing source code.
The ASSERT directive takes effect when you specify the -qhot or -qsmp compiler options.
Syntax
>>-ASSERT--(--assertion_list--)-------------------------------->< |
Rules
The first noncomment line (not including other directives) following the ASSERT directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The ASSERT directive applies only to the DO loop immediately following the directive, and not to any nested DO loops.
ITERCNT provides an estimate to the compiler about roughly how many iterations the DO loop will typically run. There is no requirement that the value be accurate; ITERCNT will only affect performance, never correctness.
When NODEPS is specified, the user is explicitly declaring to the compiler that no loop-carried dependencies exist within the DO loop or any procedures invoked from within the DO loop. A loop-carried dependency involves two iterations within a DO loop interfering with one another. Interference occurs in the following situations:
It is possible for two complementary ASSERT directives to apply to any given DO loop. However, an ASSERT directive cannot be followed by a contradicting ASSERT directive for a given DO loop:
!IBM* ASSERT (ITERCNT(10)) !IBM* INDEPENDENT, REDUCTION (A) !IBM* ASSERT (ITERCNT(20)) ! invalid DO I = 1, N A(I) = A(I) * I END DO
In the example above, the ASSERT(ITERCNT(20)) directive contradicts the ASSERT(ITERCNT(10)) directive and is invalid.
The ASSERT directive overrides the -qassert compiler option for the DO loop on which the ASSERT directive is specified.
Examples
Example 1:
! An example of the ASSERT directive with NODEPS. PROGRAM EX1 INTEGER A(100) !IBM* ASSERT (NODEPS) DO I = 1, 100 A(I) = A(I) * FNC1(I) END DO END PROGRAM EX1 FUNCTION FNC1(I) FNC1 = I * I END FUNCTION FNC1
Example 2:
! An example of the ASSERT directive with NODEPS and ITERCNT. SUBROUTINE SUB2 (N) INTEGER A(N) !IBM* ASSERT (NODEPS,ITERCNT(100)) DO I = 1, N A(I) = A(I) * FNC2(I) END DO END SUBROUTINE SUB2 FUNCTION FNC2 (I) FNC2 = I * I END FUNCTION FNC2
Related Information
Purpose
The BLOCK_LOOP directive allows you to exert greater control over optimizations on a specific DO loop inside a loop nest. Using a technique called blocking, the BLOCK_LOOP directive separates large iteration count DO loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.
Applying BLOCK_LOOP to a loop with dependencies, or a loop with alternate entry or exit points will produce unexpected results.
Syntax
>>-BLOCK_LOOP--(--n--+--------------+--)----------------------->< '-,--name_list-' |
If you do not specify name, blocking occurs on the first DO loop immediately following the BLOCK_LOOP directive.
Rules
For loop blocking to occur, a BLOCK_LOOP directive must precede a DO loop.
You must not specify the BLOCK_LOOP directive more than once, or combine the directive with NOUNROLL_AND_FUSE, NOUNROLL, UNROLL, UNROLL_AND_FUSE or STREAM_UNROLL directives for the same DO construct.
You must not specify the BLOCK_LOOP directive for a DO WHILE loop or an infinite DO loop.
Examples
! Loop Tiling for Multi-level Memory Heirarchy 2 3 4 INTEGER :: M, N, i, j, k 5 M = 1000 6 N = 1000 7 8 !IBM* BLOCK_LOOP(L3_cache_size, L3_cache_block) 9 do i = 1, N 10 11 !IBM* LOOPID(L3_cache_block) 12 !IBM* BLOCK_LOOP(L2_cache_size, L2_cache_block) 13 do j = 1, N 14 15 !IBM* LOOPID(L2_cache_block) 16 do k = 1, M 17 do l = 1, M 18 . 19 . 20 . 21 end do 22 end do 23 end do 24 end do 25 26 end 27 28 The compiler generated code would be equivalent to: 29 30 do index1 = 1, M, L3_cache_size 31 do i = 1, N 32 do index2 = index1, min(index1 + L3_cache_size, M), L2_cache_size 33 do j = 1, N 34 do k = index2, min(index2 + L2_cache_size, M) 35 do l = 1, M 36 . 37 . 38 . 39 end do 40 end do 41 end do 42 end do 43 end do 44 end do
Related Information
Purpose
When the CNCALL directive is placed before a DO loop, you are explicitly declaring to the compiler that no loop-carried dependencies exist within any procedure called from the DO loop.
This directive only takes effect if you specify either the -qsmp or -qhot compiler option.
Syntax
>>-CNCALL------------------------------------------------------>< |
Rules
The first noncomment line (not including other directives) that is following the CNCALL directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The CNCALL directive applies only to the DO loop that is immediately following the directive and not to any nested DO loops.
When specifying the CNCALL directive, you are explicitly declaring to the compiler that no procedures invoked within the DO loop have any loop-carried dependencies. If the DO loop invokes a procedure, separate iterations of the loop must be able to concurrently call that procedure. The CNCALL directive does not assert that other operations in the loop do not have dependencies, it is only an assertion about procedure references.
A loop-carried dependency occurs when two iterations within a DO loop interfere with one another. See the ASSERT directive for the definition of interference.
Examples
! An example of CNCALL where the procedure invoked has ! no loop-carried dependency but the code within the ! DO loop itself has a loop-carried dependency. PROGRAM EX3 INTEGER A(100) !IBM* CNCALL DO I = 1, N A(I) = A(I) * FNC3(I) A(I) = A(I) + A(I-1) ! This has loop-carried dependency END DO END PROGRAM EX3 FUNCTION FNC3 (I) FNC3 = I * I END FUNCTION FNC3
Related Information
Purpose
The COLLAPSE directive reduces an entire array dimension to a single element by specifying that only the element in the lower bound of an array dimension is accessible. If you do not specify a lower bound, the default lower bound is one.
Used with discretion, the COLLAPSE directive can facilitate an increase in performance by reducing repetitive memory access associated with multiple-dimension arrays.
Syntax
>>-COLLAPSE--(--collapse_array_list--)------------------------->< |
where collapse_array is:
>>-array_name--(--expression_list--)--------------------------->< |
where expression_list is a comma separated list of expression.
Rules
The COLLAPSE directive must contain at least one array.
The COLLAPSE directive applies only to the scoping unit in which it is specified. The declarations of arrays contained in a COLLAPSE directive must appear in the same scoping unit as the directive. An array that is accessible in a scoping unit by use or host association must not specified in a COLLAPSE directive in that scoping unit.
The lowest value you can specify in expression_list is one. The highest value must not be greater than the number of dimensions in the corresponding array.
A single scoping unit can contain multiple COLLAPSE declarations, though you can only specify an array once for a particular scoping unit.
You can not specify an array in both a COLLAPSE directive and an EQUIVALENCE statement.
You can not use the COLLAPSE directive with arrays that are components of derived types.
If you apply both the COLLAPSE and SUBSCRIPTORDER directives to an array, you must specify the SUBSCRIPTORDER directive first.
The COLLAPSE directive applies to:
Examples
Example 1: In the following example, the COLLAPSE directive is applied to the explicit-shape arrays A and B. Referencing A(m,2:100,2:100) and B(m,2:100,2:100) in the inner loops, become A(m,1,1) and B(m,1,1).
!IBM* COLLAPSE(A(2,3),B(2,3)) REAL*8 A(5,100,100), B(5,100,100), c(5,100,100) DO I=1,100 DO J=1,100 DO M=1,5 A(M,J,I) = SIN(C(M,J,I)) B(M,J,I) = COS(C(M,J,I)) END DO DO M=1,5 DO N=1,M C(M,J,I) = C(M,J,I) + A(N,J,I)*B(6-N,J,I) END DO END DO END DO END DO END
Related Information
For more information on the SUBSCRIPTORDER directive, see SUBSCRIPTORDER
Purpose
EJECT directs the compiler to start a new full page of the source listing. If there has been no source listing requested, the compiler will ignore this directive.
Syntax
>>-EJECT------------------------------------------------------->< |
Rules
The EJECT compiler directive can have an inline comment and a label. However, if you specify a statement label, the compiler discards it. Therefore, you must not reference any label on an EJECT directive. An example of using the directive would be placing it before a DO loop that you do not want split across pages in the listing. If you send the source listing to a printer, the EJECT directive provides a page break.
Purpose
The INCLUDE compiler directive inserts a specified statement or a group of statements into a program unit.
Syntax
>>-INCLUDE--+-char_literal_constant-+--+---+------------------->< '-(--name--)------------' '-n-' |
You are not required to specify the full path of the desired file, but must specify the file extension if one exists.
name must contain only characters allowable in the XL Fortran character set. See Characters for the character set supported by XL Fortran.
char_literal_constant is a character literal constant.
Conditional include allows you to selectively activate INCLUDE directives within Fortran source during compilation. Specify the files to include using the -qci compiler option.
In fixed source form, the INCLUDE compiler directive must start after column 6, and can have a label.
You can add an inline comment to the INCLUDE line.
Rules
An included file can contain any complete Fortran source statements and compiler directives, including other INCLUDE compiler directives. Recursive INCLUDE compiler directives are not allowed. An END statement can be part of the included group. The first and last included lines must not be continuation lines. The statements in the include file are processed with the source form of the including file.
If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete. After the inclusion of all groups, the resulting Fortran program must follow all of the Fortran rules for statement order.
For an INCLUDE compiler directive with the left and right parentheses syntax, XL Fortran translates the file name to lowercase unless the -qmixed compiler option is on.
The file system locates the specified filename as follows:
Examples
INCLUDE '/u/userid/dc101' ! full absolute file name specified INCLUDE '/u/userid/dc102.inc' ! INCLUDE file name has an extension INCLUDE 'userid/dc103' ! relative path name specified
INCLUDE (ABCdef) ! includes file abcdef
INCLUDE '../Abc' ! includes file Abc from parent directory ! of directory being searched
Related Information
-qci Option in the XL Fortran User's Guide
Purpose
The INDEPENDENT directive, if used, must precede a DO loop, FORALL statement, or FORALL construct. It specifies that each operation in the FORALL statement or FORALL construct, can be executed in any order without affecting the semantics of the program. It also specifies each iteration of the DO loop, can be executed without affecting the semantics of the program.
Type
This directive only takes effect if you specify either the -qsmp or -qhot compiler option.
Syntax
.---------------------------------------------. V | >>-INDEPENDENT----+-----------------------------------------+-+->< +-,--NEW--(--named_variable_list--)-------+ '-,--REDUCTION--(--named_variable_list--)-' |
Rules
The first noncomment line (not including other directives) following the INDEPENDENT directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The INDEPENDENT directive applies only to the DO loop that is immediately following the directive and not to any nested DO loops.
An INDEPENDENT directive can have at most one NEW clause and at most one REDUCTION clause.
If the directive applies to a DO loop, no iteration of the loop can interfere with any other iteration. Interference occurs in the following situations:
If the NEW clause is specified, the directive must apply to a DO loop. The NEW clause modifies the directive and any surrounding INDEPENDENT directives by accepting any assertions made by such directive(s) as true. It does this even if the variables specified in the NEW clause are modified by each iteration of the loop. Variables specified in the NEW clause behave as if they are private to the body of the DO loop. That is, the program is unaffected if these variables (and any variables associated with them) were to become undefined both before and after each iteration of the loop.
Any variable you specify in the NEW clause or REDUCTION clause must not:
For FORALL, no combination of index values affected by the INDEPENDENT directive assigns to an atomic storage unit that is required by another combination. If a DO loop, FORALL statement, or FORALL construct all have the same body and each is preceded by an INDEPENDENT directive, they behave the same way.
The REDUCTION clause asserts that updates to named variables will occur within REDUCTION statements in the INDEPENDENT loop. Furthermore, the intermediate values of the REDUCTION variables are not used within the parallel section, other than in the updates themselves. Thus, the value of the REDUCTION variable after the construct is the result of a reduction tree.
If you specify the REDUCTION clause, the directive must apply to a DO loop. The only reference to a REDUCTION variable in an INDEPENDENT DO loop must be within a reduction statement.
A REDUCTION variable must be of intrinsic type, but must not be of type character. A REDUCTION variable must not be an allocatable array.
A REDUCTION variable must not occur in:
A REDUCTION statement can have one of the following forms:
>>-reduction_var_ref--=--expr--reduction_op--reduction_var_ref->< >>-reduction_var_ref--=--reduction_var_ref--reduction_op--expr->< >>-reduction_var_ref =--reduction_function--(expr,--reduction_var_ref)->< >>-reduction_var_ref =--reduction_function--(reduction_var_ref,--expr)->< |
where:
The following rules apply to REDUCTION statements:
>>-reduction_var_ref-- = --expr-- - --reduction_var_ref-------->< |
Examples
Example 1:
INTEGER A(10),B(10,12),F !IBM* INDEPENDENT ! The NEW clause cannot be FORALL (I=1:9:2) A(I)=A(I+1) ! specified before a FORALL !IBM* INDEPENDENT, NEW(J) DO M=1,10 J=F(M) ! 'J' is used as a scratch A(M)=J*J ! variable in the loop !IBM* INDEPENDENT, NEW(N) DO N=1,12 ! The first executable statement B(M,N)=M+N*N ! following the INDEPENDENT must END DO ! be either a DO or FORALL END DO END
Example 2:
X=0 !IBM* INDEPENDENT, REDUCTION(X) DO J = 1, M X = X + J**2 END DO
Example 3:
INTEGER A(100), B(100, 100) !IBM* INDEPENDENT, REDUCTION(A), NEW(J) ! Example showing an array used DO I=1,100 ! for a reduction variable DO J=1, 100 A(I)=A(I)+B(J, I) END DO END DO
Related Information
Purpose
The #line directive associates code that is created by cpp or any other Fortran source code generator with input code created by the programmer. Because the preprocessor may cause lines of code to be inserted or deleted, the #line directive can be useful in error reporting and debugging because it identifies which lines in the original source caused the preprocessor to generate the corresponding lines in the intermediate file.
Syntax
>>-#line--line_number--+----------+---------------------------->< '-filename-' |
The #line directive is a noncomment directive and follows the syntax rules for this type of directive.
Rules
The #line directive follows the same rules as other noncomment directives, with the following exceptions:
The #line directive indicates the origin of all code following the directive in the current file. Another #line directive will override a previous one.
If you supply a filename, the subsequent code in the current file will be as if it originated from that filename. If you omit the filename, and no previous #line directive with a specified filename exists in the current file, the code in the current file is treated as if it originated from the current file at the line number specified. If a previous #line directive with a specified filename does exist in the current file, the filename from the previous directive is used.
line_number indicates the position, in the appropriate file, of the line of code following the directive. Subsequent lines in that file are assumed to have a one to one correspondence with subsequent lines in the source file until another #line directive is specified or the file ends.
When XL Fortran invokes cpp for a file, the preprocessor will emit #line directives unless you also specify the -d option.
Examples
The file test.F contains:
! File test.F, Line 1 #include "test.h" PRINT*, "test.F Line 3" ... PRINT*, "test.F Line 6" #include "test.h" PRINT*, "test.F Line 8" END
The file test.h contains:
! File test.h line 1 RRINT*,1 ! Syntax Error PRINT*,2
After the C preprocessor () processes the file test.F with the default options:
#line 1 "test.F" ! File test.F, Line 1 #line 1 "test.h" ! File test.h Line 1 RRINT*,1 ! Syntax Error PRINT*,2 #line 3 "test.F" PRINT*, "test.F Line 3" ... #line 6 PRINT*, "test.F Line 6" #line 1 "test.h" ! File test.h Line 1 RRINT*,1 ! Syntax Error PRINT*,2 #line 8 "test.F" PRINT*, "test.F Line 8" END
The compiler displays the following messages after it processes the file that is created by the C preprocessor:
2 2 |RRINT*,1 !Syntax error ......a................ a - "test.h", line 2.6: 1515-019 (S) Syntax is incorrect. 4 2 |RRINT*,1 !Syntax error ......a................ a - "test.h", line 2.6: 1515-019 (S) Syntax is incorrect.
Related Information
Purpose
The LOOPID directive allows you to assign a unique identifier to loop within a scoping unit. You can use the identifier to direct loop transformations. The -qreport compiler option can use the identifier you create to provide reports on loop transformations.
Syntax
>>-LOOPID--(--name--)------------------------------------------>< |
Rules
The LOOPID directive must immediately precede a BLOCK_LOOP directive or DO construct.
You must not specify a LOOPID directive more than once for a given loop.
You must not specify a LOOPID directive for DO constructs without control statements, DO WHILE constructs, or an infinite DO.
Related Information
Purpose
The NOSIMD directive prohibits the compiler from automatically generating Vector Multimedia eXtension, (VMX), instructions in the loop immediately following the directive, or in the FORALL construct.
Syntax
>>-NOSIMD------------------------------------------------------>< |
Rules
The first noncomment line (not including other directives) following the NOSIMD directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The NOSIMD directive applies only to the DO loop that is immediately following the directive and does not apply to any nested DO loops.
You can use the NOSIMD directive together with loop optimization and SMP directives.
Examples
SUBROUTINE VEC (A, B) REAL*8 A(200), B(200) !IBM* NOSIMD FORALL (N = 1:200), B(N) = B(N) / A(N) END SUBROUTINE
Related Information
Please refer to the -qhot=simd compiler option for information on controlling VMX support for an entire application.
Purpose
The NOVECTOR directive prohibits the compiler from auto-vectorizing the loop immediately following the directive, or in the FORALL statement. . Auto-vectorization refers to converting certain operations performed in a loop and on successive array elements, into a call to a routine that computes several results simultaneously.
Syntax
>>-NOVECTOR---------------------------------------------------->< |
Rules
The first noncomment line (not including other directives) following the NOVECTOR directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The NOVECTOR directive applies only to the DO loop that is immediately following the directive and does not apply to any nested DO loops.
You can use the NOVECTOR directive together with loop optimization and SMP directives.
Examples
SUBROUTINE VEC (A, B) REAL*8 A(200), B(200) !IBM* NOVECTOR DO N = 1, 200 B(N) = B(N) / A(N) END DO END SUBROUTINE
Related Information
Please refer to the -qhot=vector compiler option for information on controlling auto-vectorization for an entire application.
Purpose
The PERMUTATION directive specifies that the elements of each array that is listed in the integer_array_name_list have no repeated values. This directive is useful when you use array elements as subscripts for other array references.
The PERMUTATION directive only takes effect if you specify either the -qsmp or -qhot compiler option.
Syntax
>>-PERMUTATION--(--integer_array_name_list--)------------------>< |
Rules
The first noncomment line (not including other directives) that is following the PERMUTATION directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The PERMUTATION directive applies only to the DO loop that is immediately following the directive, and not to any nested DO loops.
Examples
PROGRAM EX3 INTEGER A(100), B(100) !IBM* PERMUTATION (A) DO I = 1, 100 A(I) = I B(A(I)) = B(A(I)) + A(I) END DO END PROGRAM EX3
Related Information
Purpose
You can specify compiler options to affect an individual compilation unit by putting the @PROCESS compiler directive in the source file. It can override options that are specified in the configuration file, in the default settings, or on the command line.
Syntax
.-+---+----------------------------. | '-,-' | V | >>-@PROCESS----option--+----------------------+-+-------------->< '-(--suboption_list--)-' |
Rules
In fixed source form, @PROCESS can start in column 1 or after column 6. In free source form, the @PROCESS compiler directive can start in any column.
You cannot place a statement label or inline comment on the same line as an @PROCESS compiler directive.
By default, any option settings you designate with the @PROCESS compiler directive are effective only for the compilation unit in which the statement appears. If the file has more than one compilation unit, the option setting is reset to its original state before the next unit is compiled. Trigger constants specified by the DIRECTIVE option are in effect until the end of the file (or until NODIRECTIVE is processed).
The @PROCESS compiler directive must usually appear before the first statement of a compilation unit. The only exceptions are for SOURCE and NOSOURCE, which you can put in @PROCESS directives anywhere in the compilation unit.
Related Information
See Compiler Option Details in the XL Fortran User's Guide for details on compiler options.
Purpose
You can use the SNAPSHOT directive to specify a safe location where a breakpoint can be set with a debug program, and provide a set of variables that must remain visible to the debug program. The SNAPSHOT directive provides support for the -qsmp compiler option, though you can use it in a non-multi-threaded program.
There may be a slight performance hit at the point where the SNAPSHOT directive is set, because the variables must be kept in memory for the debug program to access. Variables made visible by the SNAPSHOT directive are read-only. Undefined behavior will occur if these variables are modified through the debugger. Use with discretion.
Syntax
>>-SNAPSHOT--(--named_variable_list--)------------------------->< |
Rules
To use the SNAPSHOT directive, you must specify the -qdbg compiler option at compilation.
Examples
Example 1: In the following example, the SNAPSHOT directive is used to monitor the value of private variables.
INTEGER :: IDX INTEGER :: OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM INTEGER, ALLOCATABLE :: ARR(:) ! ... !$OMP PARALLEL, PRIVATE(IDX) !$OMP MASTER ALLOCATE(ARR(OMP_GET_NUM_THREADS())) !$OMP END MASTER !$OMP BARRIER IDX = OMP_GET_THREAD_NUM() + 1 !IBM* SNAPSHOT(IDX) ! The PRIVATE variable IDX is made visible ! to the debugger. ARR(IDX) = 2*IDX + 1 !$OMP END PARALLEL
Example 2: In the following example, the SNAPSHOT directive is used to monitor the intermediate values in debugging the program.
SUBROUTINE SHUFFLE(NTH, XDAT) INTEGER, INTENT(IN) :: NTH REAL, INTENT(INOUT) :: XDAT(:) INTEGER :: I_TH, IDX, PART(1), I, J, LB, UB INTEGER :: OMP_GET_THREAD_NUM INTEGER(8) :: Y=1 REAL :: TEMP CALL OMP_SET_NUM_THREADS(NTH) PART = UBOUND(XDAT)/NTH !$OMP PARALLEL, PRIVATE(NUM_TH, I, J, LB, UB, IDX, TEMP), SHARED(XDAT) NUM_TH = OMP_GET_THREAD_NUM() + 1 LB = (NUM_TH - 1)*PART(1) + 1 UB = NUM_TH*PART(1) DO I=LB, UB !$OMP CRITICAL Y = MOD(65539_8*y, 2_8**31) IDX = INT(REAL(Y)/REAL(2_8**31)*(UB - LB) + LB) !SMP$ SNAPSHOT(i, y, idx, num_th, lb, ub) !$OMP END CRITICAL TEMP = XDAT(I) XDAT(I) = XDAT(IDX) XDAT(IDX) = TEMP ENDDO !SMP$ SNAPSHOT(TEMP) ! The user can examine the value ! of the TEMP variable !$OMP END PARALLEL END
Related Information
See the XL Fortran User's Guide for details on the -qdbg compiler option.
Purpose
The SOURCEFORM compiler directive indicates that all subsequent lines are to be processed in the specified source form until the end of the file is reached or until an @PROCESS directive or another SOURCEFORM directive specifies a different source form.
Syntax
>>-SOURCEFORM--(--source--)------------------------------------>< |
Rules
The SOURCEFORM directive can appear anywhere within a file. An include file is compiled with the source form of the including file. If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete.
The SOURCEFORM directive cannot specify a label.
Tip |
---|
To modify your existing files to Fortran 90 free source form where include files exist:
|
Examples
@PROCESS DIRECTIVE(CONVERT*) PROGRAM MAIN ! Main program not yet converted A=1; B=2 INCLUDE 'freeform.f' PRINT *, RESULT ! Reverts to fixed form END
where file freeform.f contains:
!CONVERT* SOURCEFORM(FREE(F90)) RESULT = A + B
Purpose
The STREAM_UNROLL directive instructs the compiler to apply the combined functionality of software prefetch and loop unrolling to DO loops with a large iteration count. Stream unrolling functionality is available only on POWER4(TM) platforms or higher, and optimizes DO loops to use multiple streams. You can specify STREAM_UNROLL for both inner and outer DO loops, and the compiler will use an optimal number of streams to perform stream unrolling where applicable. Applying STREAM_UNROLL to a loop with dependencies will produce unexpected results.
Syntax
>>---STREAM_UNROLL--+---------------------+-------------------->< '-(--unroll_factor--)-' |
Rules
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no stream unrolling will occur. If you want to enable stream unrolling with the -qhot option alone, you must also specify -qnostrict.
For stream unrolling to occur, a STREAM_UNROLL directive must precede a DO loop.
You must not specify the STREAM_UNROLL directive more than once, or combine the directive with BLOCK_LOOP, UNROLL, NOUNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.
You must not specify the STREAM_UNROLL directive for a DO WHILE loop or an infinite DO loop.
Examples
The following is an example of how STREAM_UNROLL can increase performance.
integer, dimension(1000) :: a, b, c integer i, m, n !IBM* stream_unroll(4) do i =1, n a(i) = b(i) + c(i) enddo end
An unroll factor reduces the number of iterations from n to n/4, as follows:
m = n/4 do i =1, n/4 a(i) = b(i) + c(i) a(i+m) = b(i+m) + c(i+m) a(i+2*m) = b(i+2*m) + c(i+2*m) a(i+3*m) = b(i+3*m) + c(i+3*m) enddo
The increased number of read and store operations are distributed among a number of streams determined by the compiler, reducing computation time and boosting performance.
Related Information
Purpose
The SUBSCRIPTORDER directive rearranges the subscripts of an array. This results in a new array shape, since the directive changes the order of array dimensions in the declaration. All references to the array are correspondingly rearranged to match the new array shape.
Used with discretion, the SUBSCRIPTORDER directive may improve performance by increasing the number of cache hits and the amount of data prefetching. You may have to experiment with this directive until you find the arrangement that yields the most performance benefits. You may find SUBSCRIPTORDER especially useful when porting code originally intended for a non-cached hardware architecture.
In a cached hardware architecture, such as the PowerPC, an entire cache line of data is often loaded into the processor in order to access each data element. Changing the storage arrangement can be used to ensure that consecutively accessed elements are stored adjacently. This may result in a performance improvement, as there are more element accesses for each cache line referenced. Additionally, adjacently storing array elements which are consecutively accessed may help to better exploit the processor's prefetching facility.
Syntax
.-,--------------------. V | >>-SUBSCRIPTORDER--(----subscriptorder_array-+--)-------------->< |
where subscriptorder_array is:
.-,---------------------. V | >>-array_name--(----subscriptorder_number-+--)----------------->< |
Rules
The SUBSCRIPTORDER directive must appear in a scoping unit preceding all declarations and references to the arrays in the subscriptorder_array list. The directive only applies to that scoping unit and must contain at least one array. If multiple scoping units share an array, then you must apply the SUBSCRIPTORDER directive to each of the applicable scoping units with identical subscript arrangements. Examples of methods of array sharing between scoping units include COMMON statements, USE statements, and subroutine arguments.
The lowest subscript number in a subscriptorder_number list must be 1. The highest number must be equal to the number of dimensions in the corresponding array. Every integer number between these two limits, including the limits, signifies a subscript number prior to rearrangement and must be included exactly once in the list.
You must not apply a SUBSCRIPTORDER directive multiple times to a particular array in a scoping unit.
You must maintain array shape conformance in passing arrays as actual arguments to elemental procedures, if one of the arrays appears in a SUBSCRIPTORDER directive. You must also adjust the actual arguments of the SHAPE, SIZE, LBOUND, and UBOUND inquiry intrinsic procedures and of most transformational intrinsic procedures.
You must manually modify data in input data files and in explicit initializations for arrays that appear in the SUBSCRIPTORDER directive.
On arrays to which the COLLAPSE directive is also applied, the COLLAPSE directive always refers to the pre-subscriptorder dimension numbers.
You must not rearrange the last dimension of an assumed-size array.
Examples
Example 1: In the following example, the SUBSCRIPTORDER directive is applied to an explicit-shape array and swaps the subscripts in every reference to the array, without affecting the program output.
!IBM* SUBSCRIPTORDER(A(2,1)) INTEGER COUNT/1/, A(3,2) DO J = 1, 3 DO K = 1, 2 ! Inefficient coding: innermost index is accessing rightmost ! dimension. The subscriptorder directive compensates by ! swapping the subscripts in the array's declaration and ! access statements. ! A(J,K) = COUNT PRINT*, J, K, A(J,K) COUNT = COUNT + 1 END DO END DO
Without the directive above, the array shape is (3,2) and the array elements would be stored in the following order:
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2)
With the directive, the array shape is (2,3) and the array elements are stored in the following order:
A(1,1) A(2,1) A(1,2) A(2,2) A(1,3) A(2,3)
Related Information
For more information on the COLLAPSE directive, see COLLAPSE
Purpose
The UNROLL directive instructs the compiler to attempt loop unrolling where applicable. Loop unrolling replicates the body of the DO loop to reduce the number of iterations required to complete the loop.
You can control loop unrolling for an entire file using the -qunroll compiler option. Specifying the directive for a particular DO loop always overrides the compiler option.
Syntax
>>-+-UNROLL--+---------------------+-+------------------------->< | '-(--unroll_factor--)-' | '-NOUNROLL------------------------' |
Rules
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no loop unrolling will occur. If you want to enable loop unrolling with the -qhot option alone, you must also specify -qnostrict.
For loop unrolling to occur, an UNROLL directive must precede a DO loop.
You must not specify the UNROLL directive more than once, or combine the directive with BLOCK_LOOP, NOUNROLL, STREAM_UNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.
You must not specify the UNROLL directive for a DO WHILE loop or an infinite DO loop.
Examples
Example 1: In this example, the UNROLL(2) directive is used to tell the compiler that the body of the loop can be replicated so that the work of two iterations is performed in a single iteration. Instead of performing 1000 iterations, if the compiler unrolls the loop, it will only perform 500 iterations.
!IBM* UNROLL(2) DO I = 1, 1000 A(I) = I END DO
If the compiler chooses to unroll the previous loop, the compiler translates the loop so that it is essentially equivalent to the following:
DO I = 1, 1000, 2 A(I) = I A(I+1) = I + 1 END DO
Example 2: In the first DO loop, UNROLL(3) is used. If unrolling is performed, the compiler will unroll the loop so that the work of three iterations is done in a single iteration. In the second DO loop, the compiler determines how to unroll the loop for maximum performance.
PROGRAM GOODUNROLL INTEGER I, X(1000) REAL A, B, C, TEMP, Y(1000) !IBM* UNROLL(3) DO I = 1, 1000 X(I) = X(I) + 1 END DO !IBM* UNROLL DO I = 1, 1000 A = -I B = I + 1 C = I + 2 TEMP = SQRT(B*B - 4*A*C) Y(I) = (-B + TEMP) / (2*A) END DO END PROGRAM GOODUNROLL
Related Information
Purpose
The UNROLL_AND_FUSE directive instructs the compiler to attempt a loop unroll and fuse where applicable. Loop unrolling replicates the body of multiple DO loops and combines the necessary iterations into a single unrolled loop. Using a fused loop can minimize the required number of loop iterations, while reducing the frequency of cache misses. Applying UNROLL_AND_FUSE to a loop with dependencies will produce unexpected results.
Syntax
>>-+-UNROLL_AND_FUSE--+---------------------+-+---------------->< | '-(--unroll_factor--)-' | '-NOUNROLL_AND_FUSE------------------------' |
Rules
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no loop unrolling will occur. If you want to enable loop unrolling with the -qhot option alone, you must also specify -qnostrict.
For loop unrolling to occur, an UNROLL_AND_FUSE directive must precede a DO loop.
You must not specify the UNROLL_AND_FUSE directive for the innermost DO loop.
You must not specify the UNROLL_AND_FUSE directive more than once, or combine the directive with BLOCK_LOOP, NOUNROLL_AND_FUSE, NOUNROLL, UNROLL, or STREAM_UNROLL directives for the same DO construct.
You must not specify the UNROLL_AND_FUSE directive for a DO WHILE loop or an infinite DO loop.
Examples
Example 1: In the following example, the UNROLL_AND_FUSE directive replicates and fuses the body of the loop. This reduces the number of cache misses for Array B.
INTEGER, DIMENSION(1000, 1000) :: A, B, C !IBM* UNROLL_AND_FUSE(2) DO I = 1, 1000 DO J = 1, 1000 A(J,I) = B(I,J) * C(J,I) END DO END DO END
The DO loop below shows a possible result of applying the UNROLL_AND_FUSE directive.
DO I = 1, 1000, 2 DO J = 1, 1000 A(J,I) = B(I,J) * C(J,I) A(J,I+1) = B(I+1, J) * C(J, I+1) END DO END DO
Example 2: The following example uses multiple UNROLL_AND_FUSE directives:
INTEGER, DIMENSION(1000, 1000) :: A, B, C, D, H !IBM* UNROLL_AND_FUSE(4) DO I = 1, 1000 !IBM* UNROLL_AND_FUSE(2) DO J = 1, 1000 DO k = 1, 1000 A(J,I) = B(I,J) * C(J,I) + D(J,K)*H(I,K) END DO END DO END DO END
Related Information
+----------------------------End of IBM Extension----------------------------+