XL C/C++ Advanced Edition now supports the following operating systems:
The new features and enhancements in XL C/C++ Advanced Edition fall into three categories: performance and optimization, conformance to industry standards, and ease of use.
Many new features and enhancements fall into the category of optimization and performance tuning.
Refinements to options -qarch and -qtune
The compiler option -qarch controls the particular instructions that are generated for the specified machine architecture. Option -qtune adjusts the instructions, scheduling, and other optimizations to enhance performance on the specified hardware. These options work together to generate application code that gives the best performance for the specified architecture. Skillful use of these options in combination is key to achieving maximal exploitation of IBM processors and hardware. The coordination of these options has been enhanced in this release to add support for the POWER5 and PowerPC 970 hardware platforms and for greater ease of use.
For a particular architecture specified by -qarch, compiling with the default -qtune suboption generates code that gives the best performance for that architecture. Option -qarch can now specify a group of architectures; compiling with -qtune=auto generates code that runs on all of the architectures in the specified group, but the instruction sequences will be those with the best performance on the architecture of the compiling machine.
The default setting for -qarch depends on the platform that you are using:
AltiVec (VMX) support
XL C/C++ for Linux supports the AltiVec programming model and provides additional features to ensure maximum compatibility with GNU C and C++ compilers. The AltiVec data types and related operations are available in 32- and 64-bit modes, wherever the architecture supports the PowerPC SIMD extension (also known as the VMX engine). The SIMD (Single Instruction, Multiple Data) instruction set enables higher utilization of microprocessor hardware. The compiler provides the ability to automatically enable SIMD vectorization at higher levels of optimization.
Support on POWER5 processors has been added for the following built-in
functions. All supported built-in functions are described in XL
C/C++ Compiler Reference.
Built-in functions for POWER5 processors | |
Function | Description |
---|---|
int __popcnt4(unsigned int); | Returns the number of bits set (=1) for a 32-bit integer. |
int __popcnt8 (unsigned long long); | Returns the number of bits set (=1) for a 64-bit integer. |
unsigned long __popcntb (unsigned long); | Counts the 1 bits in each byte of the source operand and places that count into the corresponding byte of the result. |
int __poppar4(unsigned int); | Returns 1 if an odd number of bits is set for a 32-bit integer. Otherwise, returns 0. |
int __poppar8 (unsigned long long); | Returns 1 if an odd number of bits is set for a 64-bit integer. Otherwise, returns 0. |
double __fre(double); | Returns the result of a floating-point reciprocal operation. The result is a double precision estimate of 1/x. |
float __frsqrtes(float); | Returns the result of a reciprocal square root operation. The result is a single precision estimate of the reciprocal of the square root of x. |
unsigned long __mfspr(const int); | Return a value in the specified special purpose register. |
void __mtspr(const int, unsigned long); | Set the special purpose register specified by const int. |
unsigned long __mfmsr(); | Return the machine state register. |
void __mtmsr(unsigned long); | Set the machine state register. |
void __protected_unlimited_stream_set_go(unsigned int direction, const void* addr, unsigned int ID); | Establish a protected stream of unlimited length that uses the identifier ID. The stream identifier should be within the range of 0 to 15. The stream begins with the cache line at addr. The stream fetches from either incremental memory addresses or decremental memory addresses, as specified by direction. For incremental memory addresses (that is, a forward direction), the value of direction is 1; for decremental memory addresses, the value of direction is 3. The stream is protected from being replaced by any hardware-detected streams. |
void __protected_stream_set(unsigned int direction, const void* addr, unsigned int ID); | Establish a protected stream of limited length that uses the identifier ID. The stream begins with the cache line at addr and subsequently fetches from either incremental memory addresses or decremental memory addresses, as specified by direction. The stream is protected from being replaced by any hardware-detected streams. |
void __protected_stream_count(unsigned int unit_cnt, unsigned int ID); | Set the number of cache lines for the limited-length protected stream identified by ID. The number of cache lines is specified by the parameter unit_cnt and should be within the range of 0 to 1023. |
void __protected_stream_go(); | Start to prefetch all limited-length protected streams. |
void __protected_stream_stop(unsigned int ID); | Stop prefetching the protected steam identified by ID. |
void __protected_stream_stop_all(); | Stop prefetching all protected steams. |
Four new built-in functions for floating-point division are included in this release. These software implementations of floating-point division algorithms take advantage of the PowerPC architecture and can be significantly faster than corresponding hardware instructions when used in a vector context. The new built-ins are supported for all PowerPC processors, including POWER5.
Hardware division instructions are obtained by default if floating-point
division is coded in the source program, but the compiler makes the choice
between the hardware or software division code, depending on which it deems
faster. The new built-in functions allow the user to explicitly invoke
the software algorithms. The default rounding mode (round-to-nearest)
must be in effect when the routines are called.
Built-in functions for floating-point division | |
Function | Description |
---|---|
double __swdiv_nochk(double, double); | Floating-point division of double types; no range checking. Argument restrictions: numerators equal to infinity are not allowed; denominators equal to infinity, zero, or denormalized are not allowed; the quotient of numerator and denominator may not be equal to positive or negative infinity. |
double __swdiv(double, double); | Floating-point division of double types. No argument restrictions. |
float __swdivs_nochk(float, float); | Floating-point division of float types; no range checking. Argument restrictions: numerators equal to infinity are not allowed; denominators equal to infinity, zero, or denormalized are not allowed; the quotient of numerator and denominator may not be equal to positive or negative infinity. |
float __swdivs(float, float); | Floating-point division of double types. No argument restrictions. |
Pragma directives are described in detail in XL C/C++ Compiler
Reference.
Pragma | Description |
---|---|
#pragma novector | Prohibits the compiler from automatically vectorizing the loop that immediately follows it. Automatic vectorization refers to converting certain operations that are performed in a loop on successive elements of an array, into a call to a routine that computes several results at a time. |
#pragma nosimd | Prohibits the compiler from automatically generating VMX instructions in the loop that immediately follows it. |
#pragma unrollandfuse | A pragma for optimizing nested for loops. Instructs the compiler to replicate the body of the outer loop, which is itself a loop nest, and fuses the replicas into a single unrolled loop nest. |
#pragma stream_unroll | Breaks a stream contained in a for loop into multiple streams. Intended for loops that have a large iteration count and a small number of streams. |
#pragma block_loop | Instructs the compiler to create a blocking loop for a specific for loop in a loop nest. Blocking a loop involves dividing the iteration space of a loop into parts or blocks. An additional outer loop is created, known as the blocking loop, which drives the original loop for each block. |
#pragma loopid | Marks a for loop with a scope-unique identifier. The identifier can be used by #pragma block_loop and others to control the transformations on that loop and to provide information on the loop transformations through the use of option -qreport. |
#pragma disjoint | C++ implementation added. |
extensions to #pragma unroll | Loop unrolling consists of replicating the body of a loop in order to reduce the number of iterations required to complete the loop. The #pragma unroll directive indicates to the compiler that the for loop that immediately follows the directive can be unrolled. The functionality of this pragma has been extended to allow it to be applied to both the innermost and outermost for loops. The extended #pragma functionality still excludes application to for loops that have alternate entry points. |
This release contains two new utilities related to the profile-directed feedback (PDF) compilation process. Through the use of profile-directed feedback, the compiler can provide an optimized executable that reflects how that executable ran in a number of different scenarios. A PDF record is produced as a side effect of running the instrumented executable in one of these scenarios. These records constitute the data that are collated to define typical program behavior.
The showpdf command provides the ability to display the call and block counts for all procedures executed in a profile-directed feedback training run. The utility requires compilation under the options -qpdf1 and -qshowpdf.
The mergepdf command allows the user to specify the relative importance of two or more PDF records and to combine them into a single record. This allows the user to compensate for training runs with higher execution counts (that is, longer run time), which would otherwise dominate the profile data.
The compiler adds support for the 32-bit and 64-bit mode versions of the IBM Mathematics Acceleration Subsystem (MASS) vector library: libmassvp4.a and libmassvp4_64.a, respectively. These libraries contain vector routines for single-precision and double-precision reciprocal and square root functions. The vector libraries are thread-safe and offer improved performance over the corresponding libm routines.
Starting with Version 7.0, the MASS libraries ship with the compiler.
OpenMP API V2.0 support for C, C++, and Fortran
The OpenMP Application Program Interface (API) is a portable, scalable programming model that provides a standard interface for developing multiplatform, shared-memory parallel applications in C, C++, and Fortran. The specification is defined by the OpenMP organization, a group of major computer hardware and software vendors, which includes IBM. XL C/C++ Advanced Edition is compliant with OpenMP Specification 2.0: the compiler recognizes and preserves the semantics of the following OpenMP V2.0 elements:
Enhanced Unicode and NLS support
As recommended in a recent report from the C Standard committee, the C compiler extends C99 to add new data types to support UTF-16 and UTF-32 literals. The C++ compiler also supports these new data types for compatibility with C.
Support for Boost libraries
The XL C++ compiler delivers a high level of compatability with the 1.30.2 Boost libraries. These libraries were created to provide a set of reusable, Open Source C++ libraries that are suitable for standardization. For more information, see the Boost web site at http://www.boost.org.
Language extensions related to GNU C and C++
The GNU C extensions to C99 and the GNU C++ extensions to Standard C++ are
not industry standards. Nevertheless, these non-proprietary language
features from the Open Source community have attained a certain
currency. XL C/C++ implements a subset of the GNU C and C++
extensions. Support for the following GNU C features has been added in
this release.
Feature | Remarks |
---|---|
Labels as Values | Including computed goto statements. This feature is now fully compatible with the GNU C implementation. |
Type Attributes | Attributes aligned, packed, transparent_union. |
Function Attributes | Attributes format, format_arg, always_inline, noinline. |
Variable Attributes | C++ support added for attribute section. |
Alternate Keywords | Internal changes to implementation of __extension__. |
Nested Functions |
![]() |
Cast to a Union Type |
![]() |
Macros with a Variable Number of Arguments | Using an identifier in place of __VA_ARGS__ and removing trailing comma when no __VA_ARGS__ arguments are specified. |
gcc Inline Assembler Instructions with C Expression Operands | Partial support only. |
GNU C Complex Types | C++ support added. |
GNU C Hexadecimal Float Constants | C++ support added. |
C99 Compound Literals | C++ support added. |
Arrays of Length Zero | C++ support added. |
Variable Length Arrays | C++ support added. |
New C++ compiler invocation
The compiler invocation xlc++ has been added for portability among all supported platforms. The invocation is equivalent to the invocation xlC on all platforms and is recommended. However, xlC is still fully supported.
Documentation
A man page is provided for the compiler invocation commands and for each command-line utility. The man page for the compiler invocations replaces the text help file, which was provided in previous releases.
Template registry enhancement
The C++ compiler uses a batch template instantiation scheme that involves a registry of template instantiations. In this release, the compiler adds versioning information to the template registry file that is created. This information is used by the compiler internally to track which version of the template registry file format should be used.
New and changed compiler options are described in detail in the online
documentation.
Option | Description and remarks |
---|---|
-qabi_version=n | Instructs the compiler to use version n of the C++ ABI, where
n can be:
The default depends on the operating system:
|
-qaltivec | Enables compiler support for AltiVec data types. The default is -qnoaltivec. |
-qasm=gcc | Enables partial support for assembler instructions with C expression operands. Instructs the compiler to recognize the asm keyword and its alternate spellings and to use the gcc syntax and semantics for the keyword. The default is -qnoasm. |
-qasm_as | Specifies the path and flags used to invoke an alternate assembler program in order to handle the code in an asm directive. This option overrides the default setting of the as command defined in the compiler configuration file. |
-qdirectstorage | Asserts that write-through enabled or cache-inhibited storage may be referenced in a given compilation unit. The intention of this option is to avoid unexpected behavior due to different storage control attributes that are allowed by the PowerPC architecture. The default is -qnodirectstorage. |
-qenablevmx | Instructs the compiler to generate VMX (AltiVec) code in any compiler phase. This option ensures the correct default setting of -qaltivec for the operating system in the development environment. For RHEL 3 U3 and Y-HPC, the default is -qnoenablevmx. For RHEL 4 and SLES 9, the default is -qenablevmx. |
-qkeepparm | Ensures that the parameters of a function passed in registers are saved onto the stack, instead of possibly being moved to different memory locations to improve performance. The default is -qnokeepparm. |
-qipa=threads[=N] | nothreads | Instructs the optimizer to create N threads and run up to N backends in parallel, where N is an integer in the range 1-MAXINT. The default is nothreads suboption, which is equivalent to running a single serial process. You can use this feature to reduce the IPA link time on multiprocessor computers. |
-qnoprefetch | Instructs the compiler not to automatically insert software prefetch instructions, thus allowing the user to turn off this aspect of optimization. The default is -qprefetch. |
-qnotrigraph | Instructs the compiler not to interpret trigraph sequences, regardless of the specified language level. On Linux, the default is -qtrigraph. |
-qsaveopt | Instructs the compiler to save the command-line options against which a source file is compiled into the corresponding object file. The option has no effect if compilation does not result in a .o file. The default is -qnosaveopt. |
-qshowpdf | When specified with -qpdf1 and a an optimization level of -O or higher, the compiler inserts additional profiling information into the compiled application to collect call and block counts for all procedures in the application. Running the compiled application records the call and block counts to the file ._pdf. The contents of ._pdf can then be retrieved with the showpdf utility. The default is -qnoshowpdf. |
-qsourcetype | Controls the interpretation of input file names. The default behavior is that the programming language of a source file is implied by the suffix of its file name. The default is -qsourcetype. |
-qutf | Enables the recognition of UTF literal syntax which provides 16-bit and 32-bit string literals for Unicode encoding forms. |
-qflttrap=nanq | Instructs the compiler to generate extra instructions in the code to trap NaNQs (Not a Number Quiet). The intent is to detect all NaNQs handled by or generated by floating point instructions, including those created by valid operations. |
-qhot=simd | Instructs the compiler to attempt automatic SIMD vectorization. The default is -qhot=nosimd. |
-qipa=infrequentlabel | Specifies a list of labels that are likely to be called infrequently during the course of a typical program run. The compiler can make other parts of the program faster by doing less optimization for calls to these labels. This option is only applicable to user-defined labels. |