Purpose
XL Fortran provides five directives for compiler-assisted software prefetching, as follows:
Syntax
The PREFETCH directive can take the following forms:
>>-PREFETCH_BY_LOAD--(--prefetch_variable_list--)-------------->< |
>>-PREFETCH_FOR_LOAD--(--prefetch_variable_list--)------------->< |
>>-PREFETCH_FOR_STORE--(--prefetch_variable_list--)------------>< |
>>-PREFETCH_BY_STREAM_BACKWARD--(--prefetch_variable--)-------->< |
>>-PREFETCH_BY_STREAM_FORWARD--(--prefetch_variable--)--------->< |
Rules
To use the PREFETCH_BY_STREAM_BACKWARD, PREFETCH_BY_STREAM_FORWARD, PREFETCH_FOR_LOAD and PREFETCH_FOR_STORE directives, you must compile for PowerPC hardware.
When you prefetch a variable, the memory block that includes the variable address is loaded into the cache. A memory block is equal to the size of a cache line. Since the variable you are loading into the cache may appear anywhere within the memory block, you may not be able to prefetch all the elements of an array.
These directives may appear anywhere in your source code where executable constructs may appear.
These directives can add run-time overhead to your program. Therefore you should use the directives only where necessary.
To maximize the effectiveness of the prefetch directives, it is recommended that you specify the LIGHT_SYNC directive after a single prefetch or at the end of a series of prefetches.
Examples
Example 1: This example shows valid uses of the PREFETCH_BY_LOAD, PREFETCH_FOR_LOAD, and PREFETCH_FOR_STORE directives.
For this example, assume that the size of the cache line is 64 bytes and that none of the declared data items exist in the cache at the beginning of the program. The rationale for using the directives is as follows:
PROGRAM GOODPREFETCH REAL*4 A, B, C, TEMP REAL*4 ARRA(2**5), ARRB(2**10), ARRC(2**5) INTEGER(4) I, K ! Bring ARRA into cache for writing. !IBM* PREFETCH_FOR_STORE (ARRA(1), ARRA(2**4+1)) ! Bring ARRC into cache for reading. !IBM* PREFETCH_FOR_LOAD (ARRC(1), ARRC(2**4+1)) ! Bring all variables into the cache. !IBM* PREFETCH_BY_LOAD (A, B, C, TEMP, I , K) ! A subroutine is called to allow clock cycles to pass so that the ! data is loaded into the cache before the data is referenced. CALL FOO() K = 32 DO I = 1, 2 ** 5 ! Bring ARRB(I*K) into the cache !IBM* PREFETCH_BY_LOAD (ARRB(I*K)) A = -I B = I + 1 C = I + 2 TEMP = SQRT(B*B - 4*A*C) ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A) ARRB(I*K) = (-B - TEMP) / (2*A) END DO END PROGRAM GOODPREFETCH
Example 2: In this example, assume that the total cache line's size is 256 bytes, and that none of the declared data items are initially stored in the cache or register. All elements of array ARRA and ARRC will then be read into the cache.
PROGRAM PREFETCH_STREAM REAL*4 A, B, C, TEMP REAL*4 ARRA(2**5), ARRC(2**5), ARRB(2**10) INTEGER*4 I, K ! All elements of ARRA and ARRC are read into the cache. !IBM* PREFETCH_BY_STREAM_FORWARD(ARRA(1)) ! You can substitute PREFETCH_BY_STREAM_BACKWARD (ARRC(2**5)) to read all ! elements of ARRA and ARRC into the cache. K = 32 DO I = 1, 2**5 A = -i B = i + 1 C = i + 2 TEMP = SQRT(B*B -4*A*C) ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A) ARRB(I*K) = (-B -TEMP) / (2*A) END DO END PROGRAM PREFETCH_STREAM
Related Information
For information on applying prefetch techniques to loops with a large
iteration count, see the STREAM_UNROLL
directive.