Performance and optimization

Many new features and enhancements fall into the category of optimization and performance tuning.

Architecture and processor-specific code tuning

The -qarch compiler option controls the particular instructions that are generated for the specified machine architecture. The -qtune compiler option adjusts the instructions, scheduling, and other optimizations to enhance performance on the specified hardware. These options work together to generate application code that gives the best performance for the specified architecture.

XL C/C++ V8.0 augments the list of suboptions available to the -qarch compiler option to support processors that support the VMX instruction set and the newly-available POWER5+ processors. The following new -qarch options are available:

High performance libraries

XL C/C++ includes highly-tuned mathematical functions that can greatly improve the performance of mathematically-intensive applications. These functions are provided through the following high-performance libraries:

Mathematical Acceleration Subsystem (MASS)
MASS libraries provide high-performance scalar and vector functions to perform common mathematical computations. The MASS libraries included with XL C/C++ Advanced Edition V8.0 for Linux introduce new scalar and vector functions, and new support for the POWER5 processor architecture.

For more information about using the MASS libraries, see Using the Mathematical Acceleration Subsystem.

Basic Linear Algebra Subprograms (BLAS)
XL C/C++ Advanced Edition V8.0 for Linux introduces the BLAS set of high-performance algebraic functions. You can use these functions to:

For more information about using the BLAS functions, see Using the Basic Linear Algebra Subprograms.

Other performance-related compiler options and directives

The entries in the following table describes new or changed compiler options and directives not already mentioned in the sections above.

Information presented here is just a brief overview. For more information about these compiler options, refer to Options for performance optimization.

Table 2. Other Performance-Related Compiler Options and Directives
Option/directive Description
-qfloat -qfloat adds the following new suboptions:
-qfloat=relax
This suboption relaxes strict-IEEE conformance in exchange for greater speed, typically by removing trivial floating-point arithmetic operations such as adds and subtracts involving a zero on the right.
-qfloat=norelax
This is the default. Strict IEEE conformance is maintained.
-qipa -qipa adds the following new suboptions:
-qipa=clonearch=arch{,arch}
Specifies one or more processor architectures for which multiple versions of the same instruction set are produced.

XL C/C++ lets you specify multiple specific processor architectures for which instruction sets will be generated. At run time, the application will detect the specific architecture of the operating environment and select the instruction set specialized for that architecture.

-qipa=cloneproc=name{,name}
Specifies the names of one or more functions to clone for the processor architectures specified by the clonearch suboption.
-O Specifying the -O3 compiler option now instructs the compiler to also assume the -qhot=level=0 compiler option setting.

Specifying the -O4 or -O5 compiler option now instructs the compiler to also assume the -qhot=level=1 compiler option setting.

Built-in functions new for this release

The following table lists built-in functions that are new for this release. For more information on built-in functions provided by XL C/C++, see Built-in functions for POWER and PowerPC architectures.

Table 3. Built-in functions for XL C/C++
Function Description
void __builtin_return_address (unsigned int level); Returns the return address of the current function, or of one of its callers where level is a constant literal indicating the number of frames to scan up the call stack.
void __builtin_frame_address (unsigned int level); Returns the address of the function frame of the current function, or of one of its callers where level is a constant literal indicating the number of frames to scan up the call stack
int __compare_and_swap(volatile int* addr, int* old_val_addr, int new_val); Performs an atomic operation which compares the contents of a single word variable with a stored old value.
int __compare_and_swaplp(volatile long* addr, long* old_val_addr, long new_val); Performs an atomic operation which compares the contents of a double word variable with a stored old value.
int __fetch_and_add(volatile int* addr, int val); Increments the single word specified by addr by the amount specified by val in a single atomic operation.
long __fetch_and_addlp(volatile long* addr, long val); Increments the double word specified by addr by the amount specified by val in a single atomic operation.
unsigned int __fetch_and_and(volatile unsigned int* addr, unsigned int val); Clears bits in the single word specified by addr by AND-ing that value with the input val parameter, in a single atomic operation.
unsigned long __fetch_and_andlp(volatile unsigned long* addr, unsigned long val); Clears bits in the double word specified by addr by AND-ing that value with the input val parameter, in a single atomic operation.
unsigned int __fetch_and_or(volatile unsigned int* addr, unsigned int val); Sets bits in the single word specified by addr by OR-ing that value with the input val parameter, in a single atomic operation.
unsigned long __fetch_and_orlp(volatile unsigned long* addr, unsigned long val); Sets bits in the double word specified by addr by OR-ing that value with the input val parameter, in a single atomic operation.
unsigned int __fetch_and_swap(volatile unsigned int* addr, unsigned int val); Sets the single word specified by addr to the value or the input val parameter and returns the original contents of the memory location, in a single atomic operation.
double __frim(double val); Takes an input val in double format, rounds val down to the next lower integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frims(float val); Takes an input val in float format, rounds val down to the next lower integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __frin(double val); Takes an input val in double format, rounds val to the nearest integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frins(float val); Takes an input val in float format, rounds val to the nearest integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __frip(double val); Takes an input val in double format, rounds val up to the next higher integral value, and returns the result in double format. Valid only for POWER5+ processors.
float __frips(float val); Takes an input val in float format, rounds val up to the next higher integral value, and returns the result in float format. Valid only for POWER5+ processors.
double __friz(double val); Takes an input val in double format, rounds val to the next integral value closest to zero, and returns the result in double format. Valid only for POWER5+ processors.
float __frizs(float val); Takes an input val in float format, rounds val to the next integral value closest to zero, and returns the result in float format. Valid only for POWER5+ processors.
long __ldarx(volatile long* addr); Generates a Load Double Word And Reserve Indexed (ldarx) instruction. This instruction can be used in conjunction with a subsequent stwcx. instruction to implement a read-modify-write on a specified memory location.
int __lwarx(volatile int* addr); Generates a Load Word And Reserve Indexed (lwarx) instruction. This instruction can be used in conjunction with a subsequent stwcx. instruction to implement a read-modify-write on a specified memory location.
int __stdcx(volatile long* addr, long val); Generates a Store Double Word Conditional Indexed (stdcx.) instruction. This instruction can be used in conjunction with a preceding ldarx instruction to implement a read-modify-write on a specified memory location.
int __stwcx(volatile int* addr, int val); Generates a Store Word Conditional Indexed (stwcx.) instruction. This instruction can be used in conjunction with a preceding lwarx instruction to implement a read-modify-write on a specified memory location.
unsigned long __mftb(); Generates a Move From Time Base (mftb) hardware instruction.
unsigned int __mftbu(); Generates a Move From Time Base Upper (mftbu) hardware instruction.

Related Information