Appendix B. OpenMP compliance and support

The OpenMP Application Program Interface (API) is a portable, scalable programming model that provides a standard interface for developing multiplatform, shared-memory parallel applications in C, C++, and Fortran. The specification is defined by the OpenMP organization, a group of major computer hardware and software vendors, which includes IBM.

XL C/C++ is compliant with OpenMP Specification 2.0. The compiler recognizes and preserves the semantics of the following OpenMP V2.0 elements:

The directives, library functions, and environment variables described below allow you to create and manage parallel programs while maintaining portability.

To enable OpenMP parallel processing, you must specify the -qsmp compiler option.

Related References


OpenMP directives

Each directive starts with #pragma omp, to reduce the potential for conflict with other pragma directives.

OpenMP directives in XL C/C++
Directive name Directive Description
parallel The parallel directive defines a parallel region, which is a region of the program that is to be executed by multiple threads in parallel.
for The for directive identifies an iterative work-sharing construct that specifies a region in which the iterations of the associated loop should be executed in parallel. The iterations of the for loop are distributed across threads that already exist.
sections The sections directive identifies a non-iterative work-sharing construct that specifies a set of constructs that are to be divided among threads in a team. Each section is executed once by a thread in the team.
single The single directive identifies a construct that specifies that the associated structured block is executed by only one thread in the team (not necessarily the master thread).
parallel for The parallel for directive is a shortcut form for a parallel region that contains a single for directive. The semantics are identical to explicitly specifying a parallel directive immediately followed by a for directive.
parallel sections The parallel sections directive provides a shortcut form for specifying a parallel region containing a single sections directive. The semantics are identical to explicitly specifying a parallel directive immediately followed by a sections directive.
master The master directive identifies a construct that specifies a structured block that is executed by the master thread of the team.
critical The critical directive identifies a construct that restricts execution of the associated structured block to a single thread at a time. An optional name may be used to identify the critical region. A thread waits at the beginning of a critical region until no other thread is executing a critical region with the same name. All unnamed critical directives map to the same unspecified name.
barrier The barrier directive synchronizes all the threads in a team. When encountered, each thread waits until all of the others have reached this point. After all threads have encountered the barrier, each thread begins executing the statements after the barrier directive in parallel.
atomic The atomic directive identifies a specific memory location that must be updated atomically and not be exposed to multiple, simultaneous writing threads.
flush The flush directive identifies a point at which the compiler ensures that all threads in a parallel region have the same view of specified objects in memory.
ordered The ordered directive identifies a structured block of code that must be executed in sequential order.
threadprivate The threadprivate directive declares file-scope, namespace-scope, or static block-scope variables to be private to a thread.

OpenMP data scope attribute clauses

Clauses may be specified on the directives to control the scope attributes of variables for the duration of the parallel or work-sharing constructs.

OpenMP data scope attribute clauses in XL C/C++
Data Scope Attribute Clause Name Data Scope Attribute Clause Description
private The private clause declares the variables in the list to be private to each thread in a team.
firstprivate The firstprivate clause provides a superset of the functionality provided by the private clause.
lastprivate The lastprivate clause provides a superset of the functionality provided by the private clause.
copyprivate The copyprivate clause provides an alternative to using a shared variable to broadcast a value to a team. The mechanism uses a private variable to broadcast a value from one team member to other members.
num_threads The num_threads clause provides the ability to request a specific number of threads for a parallel construct.
shared The shared clause shares variables that appear in the list among all the threads in a team. All threads within a team access the same storage area for shared variables.
reduction The reduction clause performs a reduction on the scalar variables that appear in list, with a specified operator.
default The default clause allows the user to affect the data scope attributes of variables.

OpenMP library functions

OpenMP runtime library functions are included in the header <omp.h>. They include execution environment functions that can be used to control and query the parallel execution environment, and lock functions that can be used to synchronize access to data.

OpenMP runtime library functions in XL C/C++
Runtime Library Function Name Runtime Library Function Description
omp_set_num_threads Sets the number of threads to use for subsequent parallel regions.
omp_get_num_threads Returns the number of threads currently in the team executing the parallel region from which it is called.
omp_get_max_threads Returns the maximum value that can be returned by calls to omp_get_num_threads.
omp_get_thread_num Returns the thread number, within its team, of the thread executing the function. The master thread of the team is thread 0.
omp_get_num_procs Returns the maximum number of processors that could be assigned to the program.
omp_in_parallel Returns non-zero if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0.
omp_set_dynamic Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions.
omp_get_dynamic Returns non-zero if dynamic thread adjustment is enabled and returns 0 otherwise.
omp_set_nested Enables or disables nested parallelism.
omp_get_nested Returns non-zero if nested parallelism is enabled and 0 if it is disabled.
omp_init_lock Initializes a simple lock.
omp_destroy_lock Removes a simple lock.
omp_set_lock Waits until a simple lock is available.
omp_unset_lock Releases a simple lock.
omp_test_lock Tests a simple lock.
omp_init_nest_lock Initializes a nestable lock.
omp_destroy_nest_lock Removes a nestable lock.
omp_set_nest_lock Waits until a nestable lock is available.
omp_unset_nest_lock Releases a nestable lock.
omp_test_nest_lock Tests a nestable lock.
omp_get_wtick Returns the number of seconds between successive clock ticks.
omp_get_wtime Returns the elapsed wall-clock time in seconds.

OpenMP environment variables

OpenMP environment variables control the execution of parallel code. The names of environment variables must always be in upper case, while their values are not case-sensitive.

OpenMP environment variables in XL C/C++
Description Syntax
OMP_SCHEDULE

Sets the run-time schedule type and chunk size. Applies only to OpenMP directives that have the scheduling type set to runtime.

                 .-static--+------+-------.
                 |         '-,--n-'       |
>>-OMP_SCHEDULE=-+-+-affinity-+--+------+-+--------------------><
                 | +-dynamic--+  '-,--n-' |
                 | '-guided---'           |
                 '-runtime----------------'
 
 

where

affinity
An IBM extension valid for C only. Specifies that iterations of a loop are initially divided into local partitions of a size equal to the ceiling of the number of iterations divided by the number of threads: CEILING(number_of_iterations ÷ number_of_threads). Each local partition is further subdivided into chunks of a size equal to the ceiling of half of the number of iterations remaining in the local partition: CEILING(iterations_left_in_local_partition ÷ 2). When a thread becomes free, it takes the next chunk from its local partition. If no chunks are in the local partition, the thread takes an available chunk from a partition of another thread. If n is specified, each local partition is subdivided into chunks of size n. If n is not specified, the default value is 1.

dynamic
Specifies that iterations for a for loop should be divided into a series of chunks of size n and that the chunks are handled according to the following process. A thread waiting for an assignment is assigned a chunk of iterations, which it executes and then waits for its next assignment. This process is repeated until all chunks are assigned. If n is not specified, the default chunk size is 1.

guided
Specifies that iterations for a for loop should be assigned to threads in chunks with decreasing sizes and that the chunks are handled according to the following process. A thread that finishes its assigned chunk of iterations is dynamically assigned another chunk, until all chunks are assigned. If n is not specified, the default value for the initial chunk size is 1.

static
Specifies that iterations for a for loop should be divided into a series of chunks of size n and that the chunks are handled according to the following process. Available threads are assigned chunks in an order determined by the thread number. When n is not specified, the iteration space is divided into chunks that are approximately equal in size, with one chunk assigned to each thread.

n
Is a positive number, representing the chunk size.
OMP_DYNAMIC

Enables or disables dynamic adjustment of the number of threads available for the execution of parallel regions.

                .-true--.
>>-OMP_DYNAMIC=-+-false-+--------------------------------------><
 
 

where

true
Enables dynamic adjustment of the number of threads available.

false
Disables dynamic adjustment of the number of threads available.
OMP_NUM_THREADS

Sets of the number of threads available for the execution.

>>-OMP_NUM_THREADS=n-------------------------------------------><
 
 

where

n
Represents the number of threads.
OMP_NESTED

Enables or disables nested parallelism.

               .-true--.
>>-OMP_NESTED=-+-false-+---------------------------------------><
 
 

where

true
Enables nested parallelism.

false
Disables nested parallelism.

OpenMP implementation-defined behavior

The following information is not specified in the standard. Each implementation of the standard may have its own implementation-defined values.

Conditional Compilation
The _OPENMP macro is defined to 199810.

Scheduling
The schedule clause specifies how iterations of a for loop are divided among threads of the team. The possible OpenMP standard values are static, dynamic, guided, and runtime. In addition, IBM C adds the value affinity as an extension. In the absence of an explicitly defined schedule clause, the default schedule for XL C/C++ is static.
IBM Copyright 2003