PREFETCH

Purpose

XL Fortran provides five directives for compiler-assisted software prefetching, as follows:

Syntax

The PREFETCH directive can take the following forms:



>>-PREFETCH_BY_LOAD--(--prefetch_variable_list--)--------------><
 
 



>>-PREFETCH_FOR_LOAD--(--prefetch_variable_list--)-------------><
 
 



>>-PREFETCH_FOR_STORE--(--prefetch_variable_list--)------------><
 
 
Note:
Valid for any PowerPC architecture.



>>-PREFETCH_BY_STREAM_BACKWARD--(--prefetch_variable--)--------><
 
 
Note:
Valid for any PowerPC architecture.



>>-PREFETCH_BY_STREAM_FORWARD--(--prefetch_variable--)---------><
 
 
Note:
Valid for any PowerPC architecture.

prefetch_variable
is a variable to be prefetched. The variable must be a data object with a determinable storage address. The variable can be of any data type, including intrinsic and derived data types. The variable cannot be a procedure name, subroutine name, module name, function name, constant, label, zero-sized string, or an array with a vector subscript.

Rules

To use the PREFETCH_BY_STREAM_BACKWARD, PREFETCH_BY_STREAM_FORWARD, PREFETCH_FOR_LOAD and PREFETCH_FOR_STORE directives, you must compile for PowerPC hardware.

When you prefetch a variable, the memory block that includes the variable address is loaded into the cache. A memory block is equal to the size of a cache line. Since the variable you are loading into the cache may appear anywhere within the memory block, you may not be able to prefetch all the elements of an array.

These directives may appear anywhere in your source code where executable constructs may appear.

These directives can add run-time overhead to your program. Therefore you should use the directives only where necessary.

To maximize the effectiveness of the prefetch directives, it is recommended that you specify the LIGHT_SYNC directive after a single prefetch or at the end of a series of prefetches.

Examples

Example 1: This example shows valid uses of the PREFETCH_BY_LOAD, PREFETCH_FOR_LOAD, and PREFETCH_FOR_STORE directives.

For this example, assume that the size of the cache line is 64 bytes and that none of the declared data items exist in the cache at the beginning of the program. The rationale for using the directives is as follows:

      PROGRAM GOODPREFETCH
 
      REAL*4 A, B, C, TEMP
      REAL*4 ARRA(2**5), ARRB(2**10), ARRC(2**5)
      INTEGER(4) I, K
 
! Bring ARRA into cache for writing.
!IBM* PREFETCH_FOR_STORE (ARRA(1), ARRA(2**4+1))
 
! Bring ARRC into cache for reading.
!IBM* PREFETCH_FOR_LOAD (ARRC(1), ARRC(2**4+1))
 
! Bring all variables into the cache.
!IBM* PREFETCH_BY_LOAD (A, B, C, TEMP, I , K)
 
! A subroutine is called to allow clock cycles to pass so that the
! data is loaded into the cache before the data is referenced.
      CALL FOO()
      K = 32
      DO I = 1, 2 ** 5
 
! Bring ARRB(I*K) into the cache
!IBM* PREFETCH_BY_LOAD (ARRB(I*K))
        A = -I
        B = I + 1
        C = I + 2
        TEMP = SQRT(B*B - 4*A*C)
        ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A)
        ARRB(I*K) = (-B - TEMP) / (2*A)
      END DO
      END PROGRAM GOODPREFETCH

Example 2: In this example, assume that the total cache line's size is 256 bytes, and that none of the declared data items are initially stored in the cache or register. All elements of array ARRA and ARRC will then be read into the cache.

     PROGRAM PREFETCH_STREAM
 
     REAL*4 A, B, C, TEMP
     REAL*4 ARRA(2**5), ARRC(2**5), ARRB(2**10)
     INTEGER*4 I, K
 
! All elements of ARRA and ARRC are read into the cache.
!IBM* PREFETCH_BY_STREAM_FORWARD(ARRA(1))
! You can substitute PREFETCH_BY_STREAM_BACKWARD (ARRC(2**5)) to read all
! elements of ARRA and ARRC into the cache.
     K = 32
     DO I = 1, 2**5
        A = -i
        B = i + 1
        C = i + 2
        TEMP = SQRT(B*B -4*A*C)
        ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A)
        ARRB(I*K) = (-B -TEMP) / (2*A)
     END DO
     END PROGRAM PREFETCH_STREAM
 

Related Information

For information on applying prefetch techniques to loops with a large iteration count, see the STREAM_UNROLL directive. IBM Copyright 2003