Performance and optimization

Many new features and enhancements fall into the category of optimization and performance tuning.

Architecture and processor-specific code tuning

The -qarch compiler option controls the particular instructions that are generated for the specified machine architecture. The -qtune compiler option adjusts the instructions, scheduling, and other optimizations to enhance performance on the specified hardware. These options work together to generate application code that gives the best performance for the specified architecture.

XL Fortran V10.1 augments the list of suboptions available to the -qarch compiler option to support processors that support the VMX instruction set and the newly-available POWER5+ processors. The following new -qarch options are available:

High performance libraries

XL Fortran includes highly-tuned mathematical functions that can greatly improve the performance of mathematically-intensive applications. These functions are provided through the following high-performance libraries:

Mathematical Acceleration Subsystem (MASS)
MASS libraries provide high-performance scalar and vector functions to perform common mathematical computations. The MASS libraries included with XL Fortran Advanced Edition V10.1 for Linux introduce new scalar and vector functions, and new support for the POWER5 processor architecture.

For more information about using the MASS libraries, see Using the Mathematical Acceleration Subsystem.

Basic Linear Algebra Subprograms (BLAS)
XL Fortran Advanced Edition V10.1 for Linux introduces the BLAS set of high-performance algebraic functions. You can use these functions to:

For more information about using the BLAS functions, see Using the Basic Linear Algebra Subprograms.

Other performance-related compiler options and directives

The entries in the following table describes new or changed compiler options and directives not already mentioned in the sections above.

Information presented here is just a brief overview. For more information about these compiler options, refer to Options for performance optimization.

Table 2. Other Performance-Related Compiler Options and Directives
Option/directive Description
-qfloat -qfloat adds the following new suboptions:
-qfloat=relax
This suboption relaxes strict-IEEE conformance in exchange for greater speed, typically by removing trivial floating-point arithmetic operations such as adds and subtracts involving a zero on the right.
-qfloat=norelax
This is the default. Strict IEEE conformance is maintained.
-qipa -qipa adds the following new suboptions:
-qipa=clonearch=arch{,arch}
Specifies one or more processor architectures for which multiple versions of the same instruction set are produced.

XL Fortran lets you specify multiple specific processor architectures for which instruction sets will be generated. At run time, the application will detect the specific architecture of the operating environment and select the instruction set specialized for that architecture.

-qipa=cloneproc=name{,name}
Specifies the names of one or more functions to clone for the processor architectures specified by the clonearch suboption.
-O Specifying the -O3 compiler option now instructs the compiler to also assume the -qhot=level=0 compiler option setting.

Specifying the -O4 or -O5 compiler option now instructs the compiler to also assume the -qhot=level=1 compiler option setting.

-qsmallstack The -qsmallstack compiler option adds the following new suboptions to control dynamic length variable allocation transformations:
-qsmallstack=dynlenonheap
When this suboption is specified, certain automatically-sized objects are allocated from the heap. This suboption affects automatic objects that have nonconstant character lengths or a nonconstant array bound (DYNamic LENgth ON HEAP). Specifying this suboption turns on both dynlenonheap and general smallstack transformations.
-qsmallstack=nodynlenonheap
This is the default. If this suboption is not specified, those objects are allocated on the stack. This suboption affects automatic objects that have nonconstant character lengths or a nonconstant array bound (DYNamic LENgth ON HEAP).
-qstacktemp The -qstacktemp compiler option is new, and gives you the ability to control where certain compiler temporaries are stored. Available suboptions are:
-qstacktemp=0
This is the default. Certain compiler temporaries are allocated to the heap instead of the stack at compiler's discretion, depending on the size of the compiler temporaries and the target operating system environment.
-qstacktemp= -1
Certain compiler temporaries are always allocated on the stack, providing best performance but also using the most amount of stack space.
-qstacktemp=num_bytes
Certain compiler temporaries less than num_bytes in size are allocated to the stack. Compiler temporaries greater than or equal to num_bytes are allocated to the heap.
Programs that use large arrays may to use this option if they are running out of stack space at run time. SMP or OpenMP applications that are constrained by stack space may also find this option useful to move some compiler temporaries onto the heap from the stack.

Intrinsic procedures new for this release

The following table lists intrinsic procedures that are new for this release. For more information on intrinsic procedures provided by XL Fortran, see Intrinsic procedures.

Table 3. Intrinsic procedures for XL Fortran
Function Description
FRIM(val); Takes an input val of REAL *8 format, rounds val down to the next lower integral value, and returns the result in REAL *8 format. Valid only for POWER5+ processors.
FRIMS(val); Takes an input val in REAL *4 format, rounds val down to the next lower integral value, and returns the result in REAL *4 format. Valid only for POWER5+ processors.
FRIN(val); Takes an input val in REAL *8 format, rounds val to the nearest integral value, and returns the result in REAL *8 format. Valid only for POWER5+ processors.
FRINS(val); Takes an input val in REAL *4 format, rounds val to the nearest integral value, and returns the result in REAL *4 format. Valid only for POWER5+ processors.
FRIP(val); Takes an input val in REAL *8 format, rounds val up to the next higher integral value, and returns the result in REAL *8 format. Valid only for POWER5+ processors.
FRIPS(val); Takes an input val in REAL *4 format, rounds val up to the next higher integral value, and returns the result in REAL *4 format. Valid only for POWER5+ processors.
FRIZ(val); Takes an input val in REAL *8 format, rounds val to the next integral value closest to zero, and returns the result in REAL *8 format. Valid only for POWER5+ processors.
FRIZS(val); Takes an input val in REAL *4 format, rounds val to the next integral value closest to zero, and returns the result in REAL *4 format. Valid only for POWER5+ processors.

In addition, this release of XL Fortran features several new intrinsic procedures and a new VECTOR data type to support VMX vector programming. For more information about these new intrinsic procedures, see VMX intrinsic procedures.

Related Information