The following section contains an alphabetical list of all SMP directives supported by XL Fortran. For information on directive clauses, see .SMP Directive Clauses.
Purpose
You can use the ATOMIC directive to update a specific memory location safely within a parallel region. When you use ATOMIC, you ensure that only one thread is writing to the memory location at a time, avoiding errors which might occur from simultaneous writes to the same memory location.
Normally, you would protect a shared variable within a CRITICAL construct if it is being updated by more than one thread at a time. However, certain platforms support atomic operations for updating variables. For example, some platforms might support a hardware instruction that reads from a memory location, calculates something and writes back to the location all in one atomic action. The ATOMIC directive instructs the compiler to use an atomic operation whenever possible. Otherwise, the compiler will use some other mechanism to perform an atomic update.
The ATOMIC directive only takes effect if you specify the -qsmp compiler option.
Syntax
>>-ATOMIC------------------------------------------------------>< >>-atomic_statement-------------------------------------------->< |
where atomic_statement is:
>>-+-update_variable--=--update_variable--operator--expression-----------+->< +-update_variable--=--expression--operator--update_variable-----------+ +-update_variable--=--intrinsic--(--update_variable--,--expression--)-+ '-update_variable--=--intrinsic--(--expression--,--update_variable--)-' |
Rules
The ATOMIC directive applies only to the statement which immediately follows it.
The expression in an atomic_statement is not evaluated atomically. You must ensure that no race conditions exist in the calculation.
All references made using the ATOMIC directive to the storage location of an update_variable within the entire program must have the same type and type parameters.
The function intrinsic, the operator operator, and the assignment must be the intrinsic function, operator and assignment and not a redefined intrinsic function, defined operator or defined assignment.
Examples
Example 1: In the following example, multiple threads are updating a counter. ATOMIC is used to ensure that no updates are lost.
PROGRAM P R = 0.0 !$OMP PARALLEL DO SHARED(R) DO I=1, 10 !$OMP ATOMIC R = R + 1.0 END DO PRINT *,R END PROGRAM P
Expected output:
10.0
Example 2:In the following example, an ATOMIC directive is required, because it is uncertain which element of array Y will be updated in each iteration.
PROGRAM P INTEGER, DIMENSION(10) :: Y, INDEX INTEGER B Y = 5 READ(*,*) INDEX, B !$OMP PARALLEL DO SHARED(Y) DO I = 1, 10 !$OMP ATOMIC Y(INDEX(I)) = MIN(Y(INDEX(I)),B) END DO PRINT *, Y END PROGRAM P
Input data:
10 10 8 8 6 6 4 4 2 2 4
Expected output:
5 4 5 4 5 4 5 4 5 4
Example 3: The following example is invalid, because you cannot use an ATOMIC operation to reference an array.
PROGRAM P REAL ARRAY(10) ARRAY = 0.0 !$OMP PARALLEL DO SHARED(ARRAY) DO I = 1, 10 !$OMP ATOMIC ARRAY = ARRAY + 1.0 END DO PRINT *, ARRAY END PROGRAM P
Example 4:The following example is invalid. The expression must not reference the update_variable.
PROGRAM P R = 0.0 !$OMP PARALLEL DO SHARED(R) DO I = 1, 10 !$OMP ATOMIC R = R + R END DO PRINT *, R END PROGRAM P
Related Information
Purpose
The BARRIER directive enables you to synchronize all threads in a team. When a thread encounters a BARRIER directive, it will wait until all other threads in the team reach the same point.
Type
The BARRIER directive only takes effect if you specify the -qsmp compiler option.
Syntax
>>-BARRIER----------------------------------------------------->< |
Rules
A BARRIER directive binds to the closest dynamically enclosing PARALLEL directive, if one exists.
A BARRIER directive cannot appear within the dynamic extent of the CRITICAL, DO (work-sharing), MASTER, PARALLEL DO , PARALLEL SECTIONS, SECTIONS, SINGLE and WORKSHARE directives.
All threads in the team must encounter the BARRIER directive if any thread encounters it.
All BARRIER directives and work-sharing constructs must be encountered in the same order by all threads in the team.
In addition to synchronizing the threads in a team, the BARRIER directive implies the FLUSH directive.
Examples
Example 1: An example of the BARRIER directive binding to the PARALLEL directive. Note: To calculate C, we need to ensure that A and B have been completely assigned to, so threads need to wait.
SUBROUTINE SUB1 INTEGER A(1000), B(1000), C(1000) !$OMP PARALLEL !$OMP DO DO I = 1, 1000 A(I) = SIN(I*2.5) END DO !$OMP END DO NOWAIT !$OMP DO DO J = 1, 10000 B(J) = X + COS(J*5.5) END DO !$OMP END DO NOWAIT ... !$OMP BARRIER C = A + B !$OMP END PARALLEL END
Example 2: An example of a BARRIER directive that incorrectly appears inside a CRITICAL section. This can result in a deadlock because only one thread can enter a CRITICAL section at a time.
!$OMP PARALLEL DEFAULT(SHARED) !$OMP CRITICAL DO I = 1, 10 X= X + 1 !$OMP BARRIER Y= Y + I*I END DO !$OMP END CRITICAL !$OMP END PARALLEL
Related Information
Purpose
The CRITICAL construct allows you to define independent blocks of code that are to be run by at most one thread at a time. It includes a CRITICAL directive that is followed by a block of code and ends with an END CRITICAL directive.
Type
The CRITICAL and END CRITICAL directives only take effect if you specify the -qsmp compiler option.
Syntax
>>-CRITICAL--+-----------------+------------------------------->< '-(--lock_name--)-' >>-block------------------------------------------------------->< >>-END CRITICAL--+-----------------+--------------------------->< '-(--lock_name--)-' |
Rules
The optional lock_name is a name with global scope. You must not use the lock_name to identify any other global entity in the same executable program.
If you specify the lock_name on the CRITICAL directive, you must specify the same lock_name on the corresponding END CRITICAL directive.
If you specify the same lock_name for more than one CRITICAL construct, the compiler will allow only one thread to execute any one of these CRITICAL constructs at any one time. CRITICAL constructs that have different lock_names may be run in parallel.
The same lock protects all CRITICAL constructs that do not have an explicit lock_name. In other words, the compiler will assign the same lock_name, thereby ensuring that only one thread enters any unnamed CRITICAL construct at a time.
The lock_name must not share the same name as any local entity of Class 1.
It is illegal to branch into or out of a CRITICAL construct.
The CRITICAL construct may appear anywhere in a program.
Although it is possible to nest a CRITICAL construct within a CRITICAL construct, a deadlock situation may result. The -qsmp=rec_locks compiler option can be used to prevent deadlocks. See the XL Fortran User's Guide for more information.
The CRITICAL and END CRITICAL directives imply the FLUSH directive.
Examples
Example 1: Note that in this example the CRITICAL construct appears within a DO loop that has been marked with the PARALLEL DO directive.
EXPR=0 !OMP$ PARALLEL DO PRIVATE (I) DO I = 1, 100 !OMP$ CRITICAL EXPR = EXPR + A(I) * I !OMP$ END CRITICAL END DO
Example 2: An example specifying a lock_name on the CRITICAL construct.
!SMP$ PARALLEL DO PRIVATE(T) DO I = 1, 100 T = B(I) * B(I-1) !SMP$ CRITICAL (LOCK) SUM = SUM + T !SMP$ END CRITICAL (LOCK) END DO
Related Information
Purpose
The DO (work-sharing) construct enables you to divide the execution of the loop among the members of the team that encounter it. The END DO directive enables you to indicate the end of a DO loop that is specified by the DO (work-sharing) directive.
The DO (work-sharing) and END DO directives only take effect when you specify the -qsmp compiler option.
Syntax
.----------------------. V | >>- DO----+------------------+-+------------------------------->< '-+---+--do_clause-' '-,-' >>-do_loop----------------------------------------------------->< >>-+---------------------+------------------------------------->< '-END DO--+---------+-' '- NOWAIT-' |
where do_clause is:
>>-+-firstprivate_clause-+------------------------------------->< +-lastprivate_clause--+ +-ordered_clause------+ +-private_clause------+ +-reduction_clause----+ '-schedule_clause-----' |
Rules
The first noncomment line (not including other directives) that follows the DO (work-sharing) directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The DO (work-sharing) directive applies only to the DO loop that is immediately following the directive, and not to any nested DO loops.
The END DO directive is optional. If you use the END DO directive, it must immediately follow the end of the DO loop.
You may have a DO construct that contains several DO statements. If the DO statements share the same DO termination statement, and an END DO directive follows the construct, you can only specify a work-sharing DO directive for the outermost DO statement of the construct.
If you specify NOWAIT on the END DO directive, a thread that completes its iterations of the loop early will proceed to the instructions following the loop. The thread will not wait for the other threads of the team to complete the DO loop. If you do not specify NOWAIT on the END DO directive, each thread will wait for all other threads within the same team at the end of the DO loop.
If you do not specify the NOWAIT clause, the END DO directive implies the FLUSH directive.
All threads in the team must encounter the DO (work-sharing) directive if any thread encounters it. A DO loop must have the same loop boundary and step value for each thread in the team. All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.
A DO (work-sharing) directive must not appear within the dynamic extent of a CRITICAL or MASTER construct. In addition, it must not appear within the dynamic extent of a PARALLEL SECTIONS construct, work-sharing construct, or PARALLEL DO loop, unless it is within the dynamic extent of a PARALLEL construct.
You cannot follow a DO (work-sharing) directive by another DO (work-sharing) directive. You can only specify one DO (work-sharing) directive for a given DO loop.
The DO (work-sharing) directive cannot appear with either an INDEPENDENT or DO SERIAL directive for a given DO loop.
Examples
Example 1: An example of several independent DO loops within a PARALLEL construct. No synchronization is performed after the first work-sharing DO loop, because NOWAIT is specified on the END DO directive.
!$OMP PARALLEL !$OMP DO DO I = 2, N B(I)= (A(I) + A(I-1)) / 2.0 END DO !$OMP END DO NOWAIT !$OMP DO DO J = 2, N C(J) = SQRT(REAL(J*J)) END DO !$OMP END DO C(5) = C(5) + 10 !$OMP END PARALLEL END
Example 2: An example of SHARED, and SCHEDULE clauses.
!$OMP PARALLEL SHARED(A) !$OMP DO SCHEDULE(STATIC,10) DO I = 1, 1000 A(I) = 1 * 4 END DO !$OMP END DO !$OMP END PARALLEL
Example 3: An example of both a MASTER and a DO (work-sharing) directive that bind to the closest enclosing PARALLEL directive.
!$OMP PARALLEL DEFAULT(PRIVATE) Y = 100 !$OMP MASTER PRINT *, Y !$OMP END MASTER !$OMP DO DO I = 1, 10 X(I) = I X(I) = X(I) + Y END DO !$OMP END PARALLEL END
Example 4: An example of both the FIRSTPRIVATE and the LASTPRIVATE clauses on DO (work-sharing) directives.
X = 100 !$OMP PARALLEL PRIVATE(I), SHARED(X,Y) !$OMP DO FIRSTPRIVATE(X), LASTPRIVATE(X) DO I = 1, 80 Y(I) = X + I X = I END DO !$OMP END PARALLEL END
Example 6: A valid example of a work-sharing DO directive applied to nested DO statements with a common DO termination statement.
!$OMP DO ! A work-sharing DO directive can ONLY ! precede the outermost DO statement. DO 100 I= 1,10 ! !$OMP DO **Error** ! Placing the OMP DO directive here is ! invalid DO 100 J= 1,10 ! ... 100 CONTINUE !$OMP END DO
Related Information
Purpose
The DO SERIAL directive indicates to the compiler that the DO loop that is immediately following the directive must not be parallelized. This directive is useful in blocking automatic parallelization for a particular DO loop. The DO SERIAL directive only takes effect if you specify the -qsmp compiler option.
Syntax
>>-DO SERIAL--------------------------------------------------->< |
Rules
The first noncomment line (not including other directives) that is following the DO SERIAL directive must be a DO loop. The DO SERIAL directive applies only to the DO loop that immediately follows the directive and not to any loops that are nested within that loop.
You can only specify one DO SERIAL directive for a given DO loop. The DO SERIAL directive must not appear with the DO, or PARALLEL DO directive on the same DO loop.
White space is optional between DO and SERIAL.
You should not use the OpenMP trigger constant with this directive.
Examples
Example 1: An example with nested DO loops where the inner loop (the J loop) is not parallelized.
!$OMP PARALLEL DO PRIVATE(S,I), SHARED(A) DO I=1, 500 S=0 !SMP$ DOSERIAL DO J=1, 500 S=S+1 ENDDO A(I)=S+I ENDDO
Example 2: An example with the DOSERIAL directive applied in nested loops. In this case, if automatic parallelization is enabled the I or K loop may be parallelized.
DO I=1, 100 !SMP$ DOSERIAL DO J=1, 100 DO K=1, 100 ARR(I,J,K)=I+J+K ENDDO ENDDO ENDDO
Related Information
Purpose
The FLUSH directive ensures that each thread has access to data generated by other threads. This directive is required because the compiler may keep values in processor registers if a program is optimized. The FLUSH directive ensures that the memory images that each thread views are consistent.
The FLUSH directive only takes effect if you specify the -qsmp compiler option.
You might be able to improve the performance of your program by using the FLUSH directive instead of the VOLATILE attribute. The VOLATILE attribute causes variables to be flushed after every update and before every use, while FLUSH causes variables to be written to or read from memory only when specified.
Syntax
>>-FLUSH--+--------------------------+------------------------->< '-(--variable_name_list--)-' |
Rules
You can specify this directive anywhere in your code; however, if you specify it outside of the dynamic extent of a parallel region, it is ignored.
If you specify a variable_name_list, only the variables in that list are written to or read from memory (assuming that they have not been written or read already). All variables in the variable_name_list must be at the current scope and must be thread visible. Thread visible variables can be any of the following:
If you do not specify a variable_name_list, all thread visible variables are written to or read from memory.
When a thread encounters the FLUSH directive, it writes into memory the modifications to the affected variables. The thread also reads the latest copies of the variables from memory if it has local copies of those variables: for example, if it has copies of the variables in registers.
It is not mandatory for all threads in a team to use the FLUSH directive. However, to guarantee that all thread visible variables are current, any thread that modifies a thread visible variable should use the FLUSH directive to update the value of that variable in memory. If you do not use FLUSH or one of the directives that implies FLUSH (see below), the value of the variable might not be the most recent one.
Note that FLUSH is not atomic. You must FLUSH shared variables that are controlled by a shared lock variable with one directive and then FLUSH the lock variable with another. This guarantees that the shared variables are written before the lock variable.
The following directives imply a FLUSH directive unless you specify a NOWAIT clause for those directives to which it applies:
Examples
Example 1: In the following example, two threads perform calculations in parallel and are synchronized when the calculations are complete:
PROGRAM P INTEGER INSYNC(0:1), IAM !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(INSYNC) IAM = OMP_GET_THREAD_NUM() INSYNC(IAM) = 0 !$OMP BARRIER CALL WORK !$OMP FLUSH(INSYNC) INSYNC(IAM) = 1 ! Each thread sets a flag ! once it has !$OMP FLUSH(INSYNC) ! completed its work. DO WHILE (INSYNC(1-IAM) .eq. 0) ! One thread waits for ! another to complete !$OMP FLUSH(INSYNC) ! its work. END DO !$OMP END PARALLEL END PROGRAM P SUBROUTINE WORK ! Each thread does indep- ! endent calculations. ! ... !$OMP FLUSH ! flush work variables ! before INSYNC ! is flushed. END SUBROUTINE WORK
Example 2: The following example is not valid, because it attempts to use FLUSH with a variable that is not thread visible:
FUNCTION F() INTEGER, AUTOMATIC :: i !$OMP FLUSH(I) END FUNCTION F
Purpose
The MASTER construct enables you to define a block of code that will be run by only the master thread of the team. It includes a MASTER directive that precedes a block of code and ends with an END MASTER directive.
Type
The MASTER and END MASTER directives only take effect if you specify the -qsmp compiler option.
Syntax
>>-MASTER------------------------------------------------------>< >>-block------------------------------------------------------->< >>-END MASTER-------------------------------------------------->< |
Rules
It is illegal to branch into or out of a MASTER construct.
A MASTER directive binds to the closest dynamically enclosing PARALLEL directive, if one exists.
A MASTER directive cannot appear within the dynamic extent of a work-sharing construct or within the dynamic extent of the PARALLEL DO, PARALLEL SECTIONS, and PARALLEL WORKSHARE directives.
No implied barrier exists on entry to, or exit from, the MASTER construct.
Examples
Example 1: An example of the MASTER directive binding to the PARALLEL directive.
!$OMP PARALLEL DEFAULT(SHARED) !$OMP MASTER Y = 10.0 X = 0.0 DO I = 1, 4 X = X + COS(Y) + I END DO !$OMP END MASTER !$OMP BARRIER !$OMP DO PRIVATE(J) DO J = 1, 10000 A(J) = X + SIN(J*2.5) END DO !$OMP END DO !$OMP END PARALLEL END
Related Information
Purpose
The ORDERED / END ORDERED directives cause the iterations of a block of code within a parallel loop to be executed in the order that the loop would execute in if it was run sequentially. You can force the code inside the ORDERED construct to run in a predictable order while code outside of the construct runs in parallel.
The ORDERED and END ORDERED directives only take effect if you specify the -qsmp compiler option.
Syntax
>>-ORDERED----------------------------------------------------->< >>-block------------------------------------------------------->< >>-END ORDERED------------------------------------------------->< |
Rules
The ORDERED directive can only appear in the dynamic extent of a DO or PARALLEL DO directive. It is illegal to branch into or out of an ORDERED construct.
The ORDERED directive binds to the nearest dynamically enclosing DO or PARALLEL DO directive. You must specify the ORDERED clause on the DO or PARALLEL DO directive to which the ORDERED construct binds.
ORDERED constructs that bind to different DO directives are independent of each other.
Only one thread can execute an ORDERED construct at a time. Threads enter the ORDERED construct in the order of the loop iterations. A thread will enter the ORDERED construct if all of the previous iterations have either executed the construct or will never execute the construct.
Each iteration of a parallel loop with an ORDERED construct can only execute that ORDERED construct once. Each iteration of a parallel loop can execute at most one ORDERED directive. An ORDERED construct cannot appear within the dynamic extent of a CRITICAL construct.
Examples
Example 1: In this example, an ORDERED parallel loop counts down.
PROGRAM P !$OMP PARALLEL DO ORDERED DO I = 3, 1, -1 !$OMP ORDERED PRINT *,I !$OMP END ORDERED END DO END PROGRAM P
The expected output of this program is:
3 2 1
Example 2: This example shows a program with two ORDERED constructs in a parallel loop. Each iteration can only execute a single section.
PROGRAM P !$OMP PARALLEL DO ORDERED DO I = 1, 3 IF (MOD(I,2) == 0) THEN !$OMP ORDERED PRINT *, I*10 !$OMP END ORDERED ELSE !$OMP ORDERED PRINT *, I !$OMP END ORDERED END IF END DO END PROGRAM P
The expected output of this program is:
1 20 3
Example 3: In this example, the program computes the sum of all elements of an array that are greater than a threshold. ORDERED is used to ensure that the results are always reproducible: roundoff will take place in the same order every time the program is executed, so the program will always produce the same results.
PROGRAM P REAL :: A(1000) REAL :: THRESHOLD = 999.9 REAL :: SUM = 0.0 !$OMP PARALLEL DO ORDERED DO I = 1, 1000 IF (A(I) > THRESHOLD) THEN !$OMP ORDERED SUM = SUM + A(I) !$OMP END ORDERED END IF END DO END PROGRAM P
Related Information
Purpose
The PARALLEL construct enables you to define a block of code that can be executed by a team of threads concurrently. The PARALLEL construct includes a PARALLEL directive that is followed by one or more blocks of code, and ends with an END PARALLEL directive.
The PARALLEL and END PARALLEL directives only take effect if you specify the -qsmp compiler option.
Syntax
.----------------------------. V | >>-PARALLEL----+------------------------+-+-------------------->< '-+---+--parallel_clause-' '-,-' >>-block------------------------------------------------------->< >>-END PARALLEL------------------------------------------------>< |
where parallel_clause is:
>>-+-copyin_clause-----------------+--------------------------->< +-default_clause----------------+ +-firstprivate_clause-----------+ +-IF--(--scalar_logical_expr--)-+ +-num_threads_clause------------+ +-private_clause----------------+ +-reduction_clause--------------+ '-shared_clause-----------------'
Rules
It is illegal to branch into or out of a PARALLEL construct.
The IF and DEFAULT clauses can appear at most once in a PARALLEL directive.
You should be careful when you perform input/output operations in a parallel region. If multiple threads execute a Fortran I/O statement on the same unit, you should make sure that the threads are synchronized. If you do not, the behavior is undefined. Also note that although in the XL Fortran implementation each thread has exclusive access to the I/O unit, the OpenMP specification does not require exclusive access.
Directives that bind to a parallel region will bind to that parallel region even if it is serialized.
The END PARALLEL directive implies the FLUSH directive.
Examples
Example 1: An example of an inner PARALLEL directive with the PRIVATE clause enclosing the PARALLEL construct. Note: The SHARED clause is present on the inner PARALLEL construct.
!$OMP PARALLEL PRIVATE(X) !$OMP DO DO I = 1, 10 X(I) = I !$OMP PARALLEL SHARED (X,Y) !$OMP DO DO K = 1, 10 Y(K,I)= K * X(I) END DO !$OMP END DO !$OMP END PARALLEL END DO !$OMP END DO !$OMP END PARALLEL
Example 2: An example showing that a variable cannot appear in both a PRIVATE, and SHARED clause.
!$OMP PARALLEL PRIVATE(A), SHARED(A) !$OMP DO DO I = 1, 1000 A(I) = I * I END DO !$OMP END DO !$OMP END PARALLEL
Example 3: This example demonstrates the use of the COPYIN clause. Each thread created by the PARALLEL directive has its own copy of the common block BLOCK. The COPYIN clause causes the initial value of FCTR to be copied into the threads that execute iterations of the DO loop.
PROGRAM TT COMMON /BLOCK/ FCTR INTEGER :: I, FCTR !$OMP THREADPRIVATE(/BLOCK/) INTEGER :: A(100) FCTR = -1 A = 0 !$OMP PARALLEL COPYIN(FCTR) !$OMP DO DO I=1, 100 FCTR = FCTR + I CALL SUB(A(I), I) ENDDO !$OMP END PARALLEL PRINT *, A END PROGRAM SUBROUTINE SUB(AA, J) INTEGER :: FCTR, AA, J COMMON /BLOCK/ FCTR !$OMP THREADPRIVATE(/BLOCK/) ! EACH THREAD GETS ITS OWN COPY ! OF BLOCK. AA = FCTR FCTR = FCTR - J END SUBROUTINE SUB
The expected output is:
0 1 2 3 ... 96 97 98 99
Related Information
Purpose
The PARALLEL DO directive enables you to specify which loops the compiler should parallelize. This is semantically equivalent to:
!$OMP PARALLEL !$OMP DO ... !$OMP ENDDO !$OMP END PARALLEL
and is a convenient way of parallelizing loops. The END PARALLEL DO directive allows you to indicate the end of a DO loop that is specified by the PARALLEL DO directive.
Type
The PARALLEL DO and END PARALLEL DO directives only take effect if you specify the -qsmp compiler option.
Syntax
.-------------------------------. V | >>-PARALLEL DO----+---------------------------+-+-------------->< '-+---+--parallel_do_clause-' '-,-' >>-parallel_do_loop-------------------------------------------->< >>-+-----------------+----------------------------------------->< '-END PARALLEL DO-' |
where parallel_do_clause is:
>>-+-copyin_clause----------------------+---------------------->< +-default_clause---------------------+ +-firstprivate_clause----------------+ +-IF--(--scalar_logical_expr--)------+ +-lastprivate_clause-----------------+ +-num_threads_clause-----------------+ +-ordered_clause---------------------+ +-private_clause---------------------+ +-reduction_clause-------------------+ +-SCHEDULE--(--sched_type--+----+--)-+ | '-,n-' | '-shared_clause----------------------'
Rules
The first noncomment line (not including other directives) that is following the PARALLEL DO directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The PARALLEL DO directive applies only to the DO loop that is immediately following the directive, and not to any nested DO loops.
If you specify a DO loop by a PARALLEL DO directive, the END PARALLEL DO directive is optional. If you use the END PARALLEL DO directive, it must immediately follow the end of the DO loop.
You may have a DO construct that contains several DO statements. If the DO statements share the same DO termination statement, and an END PARALLEL DO directive follows the construct, you can only specify a PARALLEL DO directive for the outermost DO statement of the construct.
You must not follow the PARALLEL DO directive by a DO (work-sharing) or DO SERIAL directive. You can specify only one PARALLEL DO directive for a given DO loop.
All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.
The PARALLEL DO directive must not appear with the INDEPENDENT directive for a given DO loop.
The IF clause may appear at most once in a PARALLEL DO directive.
An IF expression is evaluated outside of the context of the parallel construct. Any function reference in the IF expression must not have side effects.
By default, a nested parallel loop is serialized, regardless of the setting of the IF clause. You can change this default by using the -qsmp=nested_par compiler option.
If the REDUCTION variable of an inner DO loop appears in the PRIVATE or LASTPRIVATE clause of an enclosing DO loop or PARALLEL SECTIONS construct, the variable must be initialized before the inner DO loop.
A variable that appears in the REDUCTION clause of an INDEPENDENT directive of an enclosing DO loop must not also appear in the data_scope_entity_list of the PRIVATE or LASTPRIVATE clause.
You should be careful when you perform input/output operations in a parallel region. If multiple threads execute a Fortran I/O statement on the same unit, you should make sure that the threads are synchronized. If you do not, the behavior is undefined. Also note that although in the XL Fortran implementation each thread has exclusive access to the I/O unit, the OpenMP specification does not require exclusive access.
Directives that bind to a parallel region will bind to that parallel region even if it is serialized.
Examples
Example 1: A valid example with the LASTPRIVATE clause.
!$OMP PARALLEL DO PRIVATE(I), LASTPRIVATE (X) DO I = 1,10 X = I * I A(I) = X * B(I) END DO PRINT *, X ! X has the value 100
Example 2: A valid example with the REDUCTION clause.
!$OMP PARALLEL DO PRIVATE(I), REDUCTION(+:MYSUM) DO I = 1, 10 MYSUM = MYSUM + IARR(I) END DO
Example 3: A valid example where more than one thread accesses a variable that is marked as SHARED, but the variable is used only in a CRITICAL construct.
!$OMP PARALLEL DO SHARED (X) DO I = 1, 10 A(I) = A(I) * I !$OMP CRITICAL X = X + A(I) !$OMP END CRITICAL END DO
Example 4: A valid example of the END PARALLEL DO directive.
REAL A(100), B(2:100), C(100) !$OMP PARALLEL DO DO I = 2, 100 B(I) = (A(I) + A(I-1))/2.0 END DO !$OMP END PARALLEL DO !$OMP PARALLEL DO DO J = 1, 100 C(J) = X + COS(J*5.5) END DO !$OMP END PARALLEL DO END
Related Information
Purpose
The PARALLEL SECTIONS construct enables you to define independent blocks of code that the compiler can execute concurrently. The PARALLEL SECTIONS construct includes a PARALLEL SECTIONS directive followed by one or more blocks of code delimited by the SECTION directive, and ends with an END PARALLEL SECTIONS directive.
The PARALLEL SECTIONS, SECTION and END PARALLEL SECTIONS directives only take effect if you specify the -qsmp compiler option.
Syntax
.-------------------------------------. V | >>-PARALLEL SECTIONS----+---------------------------------+-+-->< '-+---+--parallel_sections_clause-' '-,-' .--------------------. V | >>-+---------+--block----+----------------+-+------------------>< '-SECTION-' '-SECTION--block-' >>-END PARALLEL SECTIONS--------------------------------------->< |
where parallel_sections_clause is:
>>-+-copyin_clause-----------------+--------------------------->< +-default_clause----------------+ +-firstprivate_clause-----------+ +-IF--(--scalar_logical_expr--)-+ +-lastprivate_clause------------+ +-num_threads_clause------------+ +-private_clause----------------+ +-reduction_clause--------------+ '-shared_clause-----------------'
Rules
The PARALLEL SECTIONS construct includes the delimiting directives, and the blocks of code they enclose. The rules below also refer to sections. You define a section as the block of code within the delimiting directives.
The SECTION directive marks the beginning of a block of code. At least one SECTION and its block of code must appear within the PARALLEL SECTIONS construct. Note, however, that you do not have to specify the SECTION directive for the first section. The end of a block is delimited by either another SECTION directive or by the END PARALLEL SECTIONS directive.
You can use the PARALLEL SECTIONS construct to specify parallel execution of the identified sections of code. There is no assumption as to the order in which sections are executed. Each section must not interfere with any other section in the construct unless the interference occurs within a CRITICAL construct. See the definition of interference outside a CRITICAL construct, for more information.
It is illegal to branch into or out of any block of code that is defined by the PARALLEL SECTIONS construct.
The compiler determines how to divide the work among the threads based on a number of factors, such as the number of threads and the number of sections to be executed in parallel. Therefore, a single thread may execute more than one SECTION, or a thread may not execute any SECTION.
All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.
Within a PARALLEL SECTIONS construct, variables that are not appearing in the PRIVATE clause are assumed to be SHARED by default.
In a PARALLEL SECTIONS construct, a variable that appears in the REDUCTION clause of an INDEPENDENT directive or the PARALLEL DO directive of an enclosing DO loop must not also appear in the data_scope_entity_list of the PRIVATE clause.
If the REDUCTION variable of the inner PARALLEL SECTIONS construct appears in the PRIVATE clause of an enclosing DO loop or PARALLEL SECTIONS construct, the variable must be initialized before the inner PARALLEL SECTIONS construct.
The PARALLEL SECTIONS construct must not appear within a CRITICAL construct.
You should be careful when you perform input/output operations in a parallel region. If multiple threads execute a Fortran I/O statement on the same unit, you should make sure that the threads are synchronized. If you do not, the behavior is undefined. Also note that although in the XL Fortran implementation each thread has exclusive access to the I/O unit, the OpenMP specification does not require exclusive access.
Directives that bind to a parallel region will bind to that parallel region even if it is serialized.
The END PARALLEL SECTIONS directive implies the FLUSH directive.
Examples
Example 1:
!$OMP PARALLEL SECTIONS !$OMP SECTION DO I = 1, 10 C(I) = MAX(A(I),A(I+1)) END DO !$OMP SECTION W = U + V Z = X + Y !$OMP END PARALLEL SECTIONS
Example 2: In this example, the index variable I is declared as PRIVATE. Note also that the first optional SECTION directive has been omitted.
!$OMP PARALLEL SECTIONS PRIVATE(I) DO I = 1, 100 A(I) = A(I) * I END DO !$OMP SECTION CALL NORMALIZE (B) DO I = 1, 100 B(I) = B(I) + 1.0 END DO !$OMP SECTION DO I = 1, 100 C(I) = C(I) * C(I) END DO !$OMP END PARALLEL SECTIONS
Example 3: This example is invalid because there is a data dependency for the variable C across sections.
!$OMP PARALLEL SECTIONS !$OMP SECTION DO I = 1, 10 C(I) = C(I) * I END DO !$OMP SECTION DO K = 1, 10 D(K) = C(K) + K END DO !$OMP END PARALLEL SECTIONS
Related Information
Purpose
The PARALLEL WORKSHARE construct provides a short form method for including a WORKSHARE directive inside a PARALLEL construct.
Syntax
.--------------------------------------. V | >>-PARALLEL WORKSHARE----+----------------------------------+-+->< '-+---+--parallel_workshare_clause-' '-,-' >>-block------------------------------------------------------->< >>-END PARALLEL WORKSHARE-------------------------------------->< |
where parallel_workshare_clause is any of the directives accepted by either the PARALLEL or WORKSHARE directives.
Related Information
Purpose
The SCHEDULE directive allows the user to specify the chunking method for parallelization. Work is assigned to threads in different manners depending on the scheduling type or chunk size used.
The SCHEDULE directive only takes effect if you specify the -qsmp Option compiler option.
Syntax
>>-SCHEDULE--(--sched_type--+------+--)------------------------>< '-,--n-'
For more information on sched_type parameters, see the SCHEDULE clause.
Rules
The SCHEDULE directive must appear in the specification part of a scoping unit.
Only one SCHEDULE directive may appear in the specification part of a scoping unit.
The SCHEDULE directive applies to one of the following:
Any dummy arguments appearing or referenced in the specification expression for the chunk size n must also appear in the SUBROUTINE or FUNCTION statement and in all ENTRY statements appearing in the given subprogram.
If the specified chunk size n is greater than the number of iterations, the loop will not be parallelized and will execute on a single thread.
If you specify more than one method of determining the chunking algorithm, the compiler will follow, in order of precedence:
Examples
Example 1. Given the following information:
number of iterations = 1000 number of threads = 4
and using the GUIDED scheduling type, the chunk sizes would be as follows:
250 188 141 106 79 59 45 33 25 19 14 11 8 6 4 3 3 2 1 1 1 1
The iterations would then be divided into the following chunks:
chunk 1 = iterations 1 to 250 chunk 2 = iterations 251 to 438 chunk 3 = iterations 439 to 579 chunk 4 = iterations 580 to 685 chunk 5 = iterations 686 to 764 chunk 6 = iterations 765 to 823 chunk 7 = iterations 824 to 868 chunk 8 = iterations 869 to 901 chunk 9 = iterations 902 to 926 chunk 10 = iterations 927 to 945 chunk 11 = iterations 946 to 959 chunk 12 = iterations 960 to 970 chunk 13 = iterations 971 to 978 chunk 14 = iterations 979 to 984 chunk 15 = iterations 985 to 988 chunk 16 = iterations 989 to 991 chunk 17 = iterations 992 to 994 chunk 18 = iterations 995 to 996 chunk 19 = iterations 997 to 997 chunk 20 = iterations 998 to 998 chunk 21 = iterations 999 to 999 chunk 22 = iterations 1000 to 1000
A possible scenario for the division of work could be:
thread 1 executes chunks 1 5 10 13 18 20 thread 2 executes chunks 2 7 9 14 16 22 thread 3 executes chunks 3 6 12 15 19 thread 4 executes chunks 4 8 11 17 21
Example 2. Given the following information:
number of iterations = 100 number of threads = 4
and using the AFFINITY scheduling type, the iterations would be divided into the following partitions:
partition 1 = iterations 1 to 25 partition 2 = iterations 26 to 50 partition 3 = iterations 51 to 75 partition 4 = iterations 76 to 100
The partitions would be divided into the following chunks:
chunk 1a = iterations 1 to 13 chunk 1b = iterations 14 to 19 chunk 1c = iterations 20 to 22 chunk 1d = iterations 23 to 24 chunk 1e = iterations 25 to 25 chunk 2a = iterations 26 to 38 chunk 2b = iterations 39 to 44 chunk 2c = iterations 45 to 47 chunk 2d = iterations 48 to 49 chunk 2e = iterations 50 to 50 chunk 3a = iterations 51 to 63 chunk 3b = iterations 64 to 69 chunk 3c = iterations 70 to 72 chunk 3d = iterations 73 to 74 chunk 3e = iterations 75 to 75 chunk 4a = iterations 76 to 88 chunk 4b = iterations 89 to 94 chunk 4c = iterations 95 to 97 chunk 4d = iterations 98 to 99 chunk 4e = iterations 100 to 100
A possible scenario for the division of work could be:
thread 1 executes chunks 1a 1b 1c 1d 1e 4d thread 2 executes chunks 2a 2b 2c 2d thread 3 executes chunks 3a 3b 3c 3d 3e 2e thread 4 executes chunks 4a 4b 4c 4e
In this scenario, thread 1 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 4. Similarly, thread 3 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 2.
Example 3. Given the following information:
number of iterations = 1000 number of threads = 4
and using the DYNAMIC scheduling type and chunk size of 100, the chunk sizes would be as follows:
100 100 100 100 100 100 100 100 100 100
The iterations would be divided into the following chunks:
chunk 1 = iterations 1 to 100 chunk 2 = iterations 101 to 200 chunk 3 = iterations 201 to 300 chunk 4 = iterations 301 to 400 chunk 5 = iterations 401 to 500 chunk 6 = iterations 501 to 600 chunk 7 = iterations 601 to 700 chunk 8 = iterations 701 to 800 chunk 9 = iterations 801 to 900 chunk 10 = iterations 901 to 1000
A possible scenario for the division of work could be:
thread 1 executes chunks 1 5 9 thread 2 executes chunks 2 8 thread 3 executes chunks 3 6 10 thread 4 executes chunks 4 7
Example 4. Given the following information:
number of iterations = 100 number of threads = 4
and using the STATIC scheduling type, the iterations would be divided into the following chunks:
chunk 1 = iterations 1 to 25 chunk 2 = iterations 26 to 50 chunk 3 = iterations 51 to 75 chunk 4 = iterations 76 to 100
A possible scenario for the division of work could be:
thread 1 executes chunks 1 thread 2 executes chunks 2 thread 3 executes chunks 3 thread 4 executes chunks 4
Related Information
Purpose
The SECTIONS construct defines distinct blocks of code to be executed in parallel by threads in the team.
The SECTIONS and END SECTIONS directives only take effect if you specify the -qsmp compiler option.
Syntax
.----------------------------. V | >>-SECTIONS----+------------------------+-+-------------------->< '-+---+--sections_clause-' '-,-' .--------------------. V | >>-+---------+--block----+----------------+-+------------------>< '-SECTION-' '-SECTION--block-' >>-END SECTIONS--+--------+------------------------------------>< '-NOWAIT-' |
where sections_clause is:
>>-+-firstprivate_clause-+------------------------------------->< +-lastprivate_clause--+ +-private_clause------+ '-reduction_clause----'
Rules
The SECTIONS construct must be encountered by all threads in a team or by none of the threads in a team. All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.
The SECTIONS construct includes the delimiting directives, and the blocks of code they enclose. At least one block of code must appear in the construct.
You must specify the SECTION directive at the beginning of each block of code except for the first. The end of a block is delimited by either another SECTION directive or by the END SECTIONS directive.
It is illegal to branch into or out of any block of code that is enclosed in the SECTIONS construct. All SECTION directives must appear within the lexical extent of the SECTIONS/END SECTIONS directive pair.
The compiler determines how to divide the work among the threads based on a number of factors, such as the number of threads in the team and the number of sections to be executed in parallel. Therefore, a single thread might execute more than one SECTION. It is also possible that a thread in the team might not execute any SECTION.
In order for the directive to execute in parallel, you must place the SECTIONS/END SECTIONS pair within the dynamic extent of a parallel region. Otherwise, the blocks will be executed serially.
If you specify NOWAIT on the SECTIONS directive, a thread that completes its sections early will proceed to the instructions following the SECTIONS construct. If you do not specify the NOWAIT clause, each thread will wait for all of the other threads in the same team to reach the END SECTIONS directive. However, there is no implied BARRIER at the start of the SECTIONS construct.
You cannot specify a SECTIONS directive within the dynamic extent of a CRITICAL or MASTER directive.
You cannot nest SECTIONS, DO or SINGLE directives that bind to the same PARALLEL directive.
BARRIER and MASTER directives are not permitted in the dynamic extent of a SECTIONS directive.
The END SECTIONS directive implies the FLUSH directive.
Examples
Example 1: This example shows a valid use of the SECTIONS construct within a PARALLEL region.
INTEGER :: I, B(500), S, SUM ! ... S = 0 SUM = 0 !$OMP PARALLEL SHARED(SUM), FIRSTPRIVATE(S) !$OMP SECTIONS REDUCTION(+: SUM), LASTPRIVATE(I) !$OMP SECTION S = FCT1(B(1::2)) ! Array B is not altered in FCT1. SUM = SUM + S ! ... !$OMP SECTION S = FCT2(B(2::2)) ! Array B is not altered in FCT2. SUM = SUM + S ! ... !$OMP SECTION DO I = 1, 500 ! The local copy of S is initialized S = S + B(I) ! to zero. END DO SUM = SUM + S ! ... !$OMP END SECTIONS ! ... !$OMP DO REDUCTION(-: SUM) DO J=I-1, 1, -1 ! The loop starts at 500 -- the last ! value from the previous loop. SUM = SUM - B(J) END DO !$OMP MASTER SUM = SUM - FCT1(B(1::2)) - FCT2(B(2::2)) !$OMP END MASTER !$OMP END PARALLEL ! ... ! Upon termination of the PARALLEL ! region, the value of SUM remains zero.
Example 2: This example shows a valid use of nested SECTIONS.
!$OMP PARALLEL !$OMP MASTER CALL RANDOM_NUMBER(CX) CALL RANDOM_NUMBER(CY) CALL RANDOM_NUMBER(CZ) !$OMP END MASTER !$OMP SECTIONS !$OMP SECTION !$OMP PARALLEL !$OMP SECTIONS PRIVATE(I) !$OMP SECTION DO I=1, 5000 X(I) = X(I) + CX END DO !$OMP SECTION DO I=1, 5000 Y(I) = Y(I) + CY END DO !$OMP END SECTIONS !$OMP END PARALLEL !$OMP SECTION !$OMP PARALLEL SHARED(CZ,Z) !$OMP DO DO I=1, 5000 Z(I) = Z(I) + CZ END DO !$OMP END DO !$OMP END PARALLEL !$OMP END SECTIONS NOWAIT ! The following computations do not ! depend on the results from the ! previous section. !$OMP DO DO I=1, 5000 T(I) = T(I) * CT END DO !$OMP END DO !$OMP END PARALLEL
Related Information
Purpose
You can use the SINGLE / END SINGLE directive construct to specify that the enclosed code should only be executed by one thread in the team.
The SINGLE directive only takes effect if you specify the -qsmp compiler option.
Syntax
.--------------------------. V | >>-SINGLE----+----------------------+-+------------------------>< '-+---+--single_clause-' '-,-' >>-block------------------------------------------------------->< >>-END SINGLE--+-------------------+--------------------------->< +-NOWAIT------------+ '-end_single_clause-' |
where single_clause is:
>>-+-private_clause------+------------------------------------->< '-firstprivate_clause-'
where end_single_clause is:
.---------------------------. V | >>---copyprivate_clause--+---+-+------------------------------->< '-,-'
Rules
It is illegal to branch into or out of a block that is enclosed within the SINGLE construct.
The SINGLE construct must be encountered by all threads in a team or by none of the threads in a team. All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.
If you specify NOWAIT on the END SINGLE directive, the threads that are not executing the SINGLE construct will proceed to the instructions following the SINGLE construct. If you do not specify the NOWAIT clause, each thread will wait at the END SINGLE directive until the thread executing the construct reaches the END SINGLE directive. You may not specify NOWAIT and COPYPRIVATE as part of the same END SINGLE directive.
There is no implied BARRIER at the start of the SINGLE construct. If you do not specify the NOWAIT clause, the BARRIER directive is implied at the END SINGLE directive.
You cannot nest SECTIONS, DO and SINGLE directives inside one another if they bind to the same PARALLEL directive.
SINGLE directives are not permitted within the dynamic extent of CRITICAL and MASTER directives. BARRIER and MASTER directives are not permitted within the dynamic extent of SINGLE directives.
If you have specified a variable as PRIVATE, FIRSTPRIVATE, LASTPRIVATE or REDUCTION in the PARALLEL construct which encloses your SINGLE construct, you cannot specify the same variable in the PRIVATE or FIRSTPRIVATE clause of the SINGLE construct.
The SINGLE directive binds to the closest dynamically enclosing PARALLEL directive, if one exists.
Examples
Example 1: In this example, the BARRIER directive is used to ensure that all threads finish their work before entering the SINGLE construct.
REAL :: X(100), Y(50) ! ... !$OMP PARALLEL DEFAULT(SHARED) CALL WORK(X) !$OMP BARRIER !$OMP SINGLE CALL OUTPUT(X) CALL INPUT(Y) !$OMP END SINGLE CALL WORK(Y) !$OMP END PARALLEL
Example 2: In this example, the SINGLE construct ensures that only one thread is executing a block of code. In this case, array B is initialized in the DO (work-sharing) construct. After the initialization, a single thread is employed to perform the summation.
INTEGER :: I, J REAL :: B(500,500), SM ! ... J = ... SM = 0.0 !$OMP PARALLEL !$OMP DO PRIVATE(I) DO I=1, 500 CALL INITARR(B(I,:), I) ! initialize the array B ENDDO !$OMP END DO !$OMP SINGLE ! employ only one thread DO I=1, 500 SM = SM + SUM(B(J:J+1,I)) ENDDO !$OMP END SINGLE !$OMP DO PRIVATE(I) DO I=500, 1, -1 CALL INITARR(B(I,:), 501-I) ! re-initialize the array B ENDDO !$OMP END PARALLEL
Example 3: This example shows a valid use of the PRIVATE clause. Array X is PRIVATE to the SINGLE construct. If you were to reference array X immediately following the construct, it would be undefined.
REAL :: X(2000), A(1000), B(1000) !$OMP PARALLEL ! ... !$OMP SINGLE PRIVATE(X) CALL READ_IN_DATA(X) A = X(1::2) B = X(2::2) !$OMP END SINGLE ! ... !$OMP END PARALLEL
Example 4: In this example, the LASTPRIVATE variable I is used in allocating TMP, the PRIVATE variable in the SINGLE construct.
SUBROUTINE ADD(A, UPPERBOUND) INTEGER :: A(UPPERBOUND), I, UPPERBOUND INTEGER, ALLOCATABLE :: TMP(:) ! ... !$OMP PARALLEL !$OMP DO LASTPRIVATE(I) DO I=1, UPPERBOUND A(I) = I + 1 ENDDO !$OMP END DO !$OMP SINGLE FIRSTPRIVATE(I), PRIVATE(TMP) ALLOCATE(TMP(0:I-1)) TMP = (/ (A(J),J=I,1,-1) /) ! ... DEALLOCATE(TMP) !$OMP END SINGLE !$OMP END PARALLEL ! ... END SUBROUTINE ADD
Example 5: In this example, a value for the variable I is entered by the user. This value is then copied into the corresponding variable I for all other threads in the team using a COPYPRIVATE clause on an END SINGLE directive.
INTEGER I !$OMP PARALLEL PRIVATE (I) ! ... !$OMP SINGLE READ (*, *) I !$OMP END SINGLE COPYPRIVATE (I) ! In all threads in the team, I ! is equal to the value ! ... ! that you entered. !$OMP END PARALLEL
Example 6: In this example, variable J with a POINTER attribute is specified in a COPYPRIVATE clause on an END SINGLE directive. The value of J, not the value of the object that it points to, is copied into the corresponding variable J for all other threads in the team. The object itself is shared among all the threads in the team.
INTEGER, POINTER :: J !$OMP PARALLEL PRIVATE (J) ! ... !$OMP SINGLE ALLOCATE (J) READ (*, *) J !$OMP END SINGLE COPYPRIVATE (J) !$OMP ATOMIC J = J + OMP_GET_THREAD_NUM() !$OMP BARRIER !$OMP SINGLE WRITE (*, *) 'J = ', J ! The result is the sum of all values added to ! J. This result shows that the pointer object ! is shared by all threads in the team. DEALLOCATE (J) !$OMP END SINGLE !$OMP END PARALLEL
Related Information
Purpose
You can use the THREADLOCAL directive to declare thread-specific common data. It is a possible method of ensuring that access to data that is contained within COMMON blocks is serialized.
In order to make use of this directive it is not necessary to specify the -qsmp compiler option, but the invocation command must be xlf_r, xlf90_r, or xlf95_r to link the necessary libraries.
Syntax
.-,-----------------------. V | >>-THREADLOCAL--+----+----/--common_block_name--/-+------------>< '-::-' |
Rules
You can only declare named blocks as THREADLOCAL. All rules and constraints that normally apply to named common blocks apply to common blocks that are declared as THREADLOCAL. See COMMON for more information on the rules and constraints that apply to named common blocks.
The THREADLOCAL directive must appear in the specification_part of the scoping unit. If a common block appears in a THREADLOCAL directive, it must also be declared within a COMMON statement in the same scoping unit. The THREADLOCAL directive may occur before or after the COMMON statement. See Main Program for more information on the specification_part of the scoping unit.
A common block cannot be given the THREADLOCAL attribute if it is declared within a PURE subprogram.
Members of a THREADLOCAL common block must not appear in NAMELIST statements.
A common block that is use-associated must not be declared as THREADLOCAL in the scoping unit that contains the USE statement.
Any pointers declared in a THREADLOCAL common block are not affected by the -qinit=f90ptr compiler option.
Objects within THREADLOCAL common blocks may be used in parallel loops and parallel sections. However, these objects are implicitly shared across the iterations of the loop, and across code blocks within parallel sections. In other words, within a scoping unit, all accessible common blocks, whether declared as THREADLOCAL or not, have the SHARED attribute within parallel loops and sections in that scoping unit.
If a common block is declared as THREADLOCAL within a scoping unit, any subprogram that declares or references the common block, and that is directly or indirectly referenced by the scoping unit, must be executed by the same thread executing the scoping unit. If two procedures that declare common blocks are executed by different threads, then they would obtain different copies of the common block, provided that the common block had been declared THREADLOCAL. Threads can be created in one of the following ways:
If a common block is declared to be THREADLOCAL in one scoping unit, it must be declared to be THREADLOCAL in every scoping unit that declares the common block.
If a THREADLOCAL common block that does not have the SAVE attribute is declared within a subprogram, the members of the block become undefined at subprogram RETURN or END, unless there is at least one other scoping unit in which the common block is accessible that is making a direct or indirect reference to the subprogram.
You cannot specify the same common_block_name for both a THREADLOCAL directive and a THREADPRIVATE directive.
Example 1: The following procedure "FORT_SUB" is invoked by two threads:
SUBROUTINE FORT_SUB(IARG) INTEGER IARG CALL LIBRARY_ROUTINE1() CALL LIBRARY_ROUTINE2() ... END SUBROUTINE FORT_SUB
SUBROUTINE LIBRARY_ROUTINE1() COMMON /BLOCK/ R ! The SAVE attribute is required for the SAVE /BLOCK/ ! common block because the program requires ! that the block remain defined after !IBM* THREADLOCAL /BLOCK/ ! library_routine1 is invoked. R = 1.0 ... END SUBROUTINE LIBRARY_ROUTINE1
SUBROUTINE LIBRARY_ROUTINE2() COMMON /BLOCK/ R SAVE /BLOCK/ !IBM* THREADLOCAL /BLOCK/ ... = R ... END SUBROUTINE LIBRARY_ROUTINE2
Example 2: "FORT_SUB" is invoked by multiple threads. This is an invalid example because "FORT_SUB" and "ANOTHER_SUB" both declare /BLOCK/ to be THREADLOCAL. They intend to share the common block, but they are executed by different threads.
SUBROUTINE FORT_SUB() COMMON /BLOCK/ J INTEGER :: J !IBM* THREADLOCAL /BLOCK/ ! Each thread executing FORT_SUB ! obtains its own copy of /BLOCK/ INTEGER A(10) ... !IBM* INDEPENDENT DO INDEX = 1,10 CALL ANOTHER_SUB(A(I)) END DO ... END SUBROUTINE FORT_SUB
SUBROUTINE ANOTHER_SUB(AA) ! Multiple threads are used to execute ANOTHER_SUB INTEGER AA COMMON /BLOCK/ J ! Each thread obtains a new copy of the INTEGER :: J ! common block /BLOCK/ !IBM* THREADLOCAL /BLOCK/ ... AA = J ! The value of 'J' is undefined. END SUBROUTINE ANOTHER_SUB
Related Information
Purpose
The THREADPRIVATE directive allows you to specify named common blocks and named variables as private to a thread but global within that thread. Once you declare a common block or variable THREADPRIVATE, each thread in the team maintains a separate copy of that common block or variable. Data written to a THREADPRIVATE common block or variable remains private to that thread and is not visible to other threads in the team.
In the serial and MASTER sections of a program, only the master thread's copy of the named common block and variable is accessible.
Use the COPYIN clause on the PARALLEL, PARALLEL DO, PARALLEL SECTIONS or PARALLEL WORKSHARE directives to specify that upon entry into a parallel region, data in the master thread's copy of a named common block or named variable is copied to each thread's private copy of that common block or variable.
The THREADPRIVATE directive only takes effect if you specify the -qsmp compiler option.
Syntax
>>-THREADPRIVATE--(--threadprivate_entity_list--)-------------->< where threadprivate_entity_list is: >>-+-variable_name---------+----------------------------------->< '-/ common_block_name /-' |
Rules
You cannot specify a THREADPRIVATE variable, common block, or the variables that comprise that common block in a PRIVATE, FIRSTPRIVATE, LASTPRIVATE, SHARED, or REDUCTION clause.
A THREADPRIVATE variable must have the SAVE attribute. For variables or common blocks declared in the scope of a module, the SAVE attribute is implied. If you declare the variable outside of the scope of the module, the SAVE attribute must be specified.
In THREADPRIVATE directives, you can only specify named variables and named common blocks.
A variable can only appear in a THREADPRIVATE directive in the scope in which it is declared, and a THREADPRIVATE variable or common block may only appear once in a given scope. The variable must not be an element of a common block, or be declared in an EQUIVALENCE statement.
You cannot specify the same common_block_name for both a THREADPRIVATE directive and a THREADLOCAL directive.
All rules and constraints that apply to named common blocks also apply to common blocks declared as THREADPRIVATE. See COMMON.
If you declare a common block as THREADPRIVATE in one scoping unit, you must declare it as THREADPRIVATE in all other scoping units in which it is declared.
On entry into any parallel region, a THREADPRIVATE variable, or a variable in a THREADPRIVATE common block is subject to the following criteria when declared in a COPYIN clause:
On entry into the first parallel region of the program, THREADPRIVATE variables or variables within a THREADPRIVATE common block not specified in a COPYIN clause are subject to the following criteria:
On entry into subsequent parallel regions of the program, THREADPRIVATE variables, or variables within a THREADPRIVATE common block not specified in a COPYIN clause, are subject to the following criteria:
You cannot access the name of a common block by use association or host association. Thus, a named common block can only appear on a THREADPRIVATE directive if the common block is declared in the scoping unit that contains the THREADPRIVATE directive. However, you can access the variables in the common block by use association or host association. For more information, see Host Association and Use Association.
The -qinit=f90ptr compiler option does not affect pointers that you have declared in a THREADPRIVATE common block.
The DEFAULT clause does not affect variables in THREADPRIVATE common blocks.
Examples
Example 1: In this example, the PARALLEL DO directive invokes multiple threads that call SUB1. The common block BLK in SUB1 shares the data that is specific to the thread with subroutine SUB2, which is called by SUB1.
PROGRAM TT INTEGER :: I, B(50) !$OMP PARALLEL DO SCHEDULE(STATIC, 10) DO I=1, 50 CALL SUB1(I, B(I)) ! Multiple threads call SUB1. ENDDO END PROGRAM TT SUBROUTINE SUB1(J, X) INTEGER :: J, X, A(100) COMMON /BLK/ A !$OMP THREADPRIVATE(/BLK/) ! Array a is private to each thread. ! ... CALL SUB2(J) X = A(J) + A(J + 50) ! ... END SUBROUTINE SUB1 SUBROUTINE SUB2(K) INTEGER :: C(100) COMMON /BLK/ C !$OMP THREADPRIVATE(/BLK/) ! ... C = K ! ... ! Since each thread has its own copy of ! common block BLK, the assignment of ! array C has no effect on the copies of ! that block owned by other threads. END SUBROUTINE SUB2
Example 2: In this example, each thread has its own copy of the common block ARR in the parallel section. If one thread initializes the common block variable TEMP, the initial value is not visible to other threads.
PROGRAM ABC INTEGER :: I, TEMP(100), ARR1(50), ARR2(50) COMMON /ARR/ TEMP !$OMP THREADPRIVATE(/ARR/) INTERFACE SUBROUTINE SUBS(X) INTEGER :: X(:) END SUBROUTINE END INTERFACE ! ... !$OMP PARALLEL SECTIONS !$OMP SECTION ! The thread has its own copy of the ! ... ! common block ARR. TEMP(1:100:2) = -1 TEMP(2:100:2) = 2 CALL SUBS(ARR1) ! ... !$OMP SECTION ! The thread has its own copy of the ! ... ! common block ARR. TEMP(1:100:2) = 1 TEMP(2:100:2) = -2 CALL SUBS(ARR2) ! ... !$OMP END PARALLEL SECTIONS ! ... PRINT *, SUM(ARR1), SUM(ARR2) END PROGRAM ABC SUBROUTINE SUBS(X) INTEGER :: K, X(:), TEMP(100) COMMON /ARR/ TEMP !$OMP THREADPRIVATE(/ARR/) ! ... DO K = 1, UBOUND(X, 1) X(K) = TEMP(K) + TEMP(K + 1) ! The thread is accessing its ! own copy of ! the common block. ENDDO ! ... END SUBROUTINE SUBS
The expected output for this program is:
50 -50
Example 3: In the following example, local variables outside of a common block are declared THREADPRIVATE.
MODULE MDL INTEGER :: A(2) INTEGER, POINTER :: P INTEGER, TARGET :: T !$OMP THREADPRIVATE(A, P) END MODULE MDL PROGRAM MVAR USE MDL INTEGER :: I INTEGER OMP_GET_THREAD_NUM CALL OMP_SET_NUM_THREADS(2) A = (/1, 2/) T = 4 P => T !$OMP PARALLEL PRIVATE(I) COPYIN(A, P) I = OMP_GET_THREAD_NUM() IF (I .EQ. 0) THEN A(1) = 100 T = 5 ELSE IF (I .EQ. 1) THEN A(2) = 200 END IF !$OMP END PARALLEL !$OMP PARALLEL PRIVATE(I) I = OMP_GET_THREAD_NUM() IF (I .EQ. 0) THEN PRINT *, 'A(2) = ', A(2) ELSE IF (I .EQ. 1) THEN PRINT *, 'A(1) = ', A(1) PRINT *, 'P => ', P END IF !$OMP END PARALLEL END PROGRAM MVAR
If dynamic threads mechanism is disabled, the expected output is:
A(2) = 2 A(1) = 1 P => 5 or A(1) = 1 P => 5 A(2) = 2
Related Information
Purpose
The WORKSHARE directive allows you to parallelize the execution of array operations. A WORKSHARE directive divides the tasks associated with an enclosed block of code into units of work. When a team of threads encounters a WORKSHARE directive, the threads in the team share the tasks, so that each unit of work executes exactly once.
The WORKSHARE directive only takes effect if you specify the -qsmp compiler option.
Syntax
>>-WORKSHARE--------------------------------------------------->< >>-block------------------------------------------------------->< >>-END WORKSHARE--+--------+----------------------------------->< '-NOWAIT-' |
The transformational intrinsic functions you can use as part of an array
operation are:
|
|
|
The block can also contain statements bound to lexically enclosed PARALLEL constructs. These statements are not restricted.
Any user-defined function calls within the block must be elemental.
Statements enclosed in a WORKSHARE directive are divided into units of work. The definition of a unit of work varies according to the statement evaluated. A unit of work is defined as follows:
If none of the above definitions apply to a statement within the block, then that statement is a unit of work.
Rules
In order to ensure that the statements within a WORKSHARE construct execute in parallel, the construct must be enclosed within the dynamic extent of a parallel region. Threads encountering a WORKSHARE construct outside the dynamic extent of a parallel region will evaluate the statements within the construct serially.
A WORKSHARE directive binds to the closest dynamically enclosing PARALLEL directive if one exists.
You must not nest DO, SECTIONS, SINGLE and WORKSHARE directives that bind to the same PARALLEL directive
You must not specify a WORKSHARE directive within the dynamic extent of CRITICAL, MASTER, or ORDERED directives.
You must not specify BARRIER, MASTER, or ORDERED directives within the dynamic extent of a WORKSHARE construct.
If an array assignment, scalar assignment, a masked array assignment or a FORALL assignment assigns to a private variable in the block, the result is undefined.
If an array expression in the block references the value, association status or allocation status of private variables, the value of the expression is undefined unless each thread computes the same value.
If you do not specify a NO WAIT clause at the end of a WORKSHARE construct, a BARRIER directive is implied.
A WORKSHARE construct must be encountered by all threads in the team or by none at all.
Examples
Example 1: In the following example, the WORKSHARE directive evaluates the masked expressions in parallel.
!$OMP WORKSHARE FORALL (I = 1 : N, AA(1, I) == 0) AA(1, I) = I BB = TRANSPOSE(AA) CC = MATMUL(AA, BB) !$OMP ATOMIC S = S + SUM(CC) !$OMP END WORKSHARE
Example 2: The following example includes a user defined ELEMENTAL as part of a WORKSHARE construct.
!$OMP WORKSHARE WHERE (AA(1, :) /= 0.0) AA(1, :) = 1 / AA(1, :) DD = TRANS(AA(1, :)) !$OMP END WORKSHARE ELEMENTAL REAL FUNCTION TRANS(ELM) RESULT(RES) REAL, INTENT(IN) :: ELM RES = ELM * ELM + 4 END FUNCTION
Related Information