Advanced command-line optimization

After applying basic optimizations and successfully compiling and executing your application, you can apply more powerful optimization tools. Higher optimization levels can have a tremendous impact on performance, but some trade-offs can occur in terms of code size, compilation time, resources and numeric or algorithmic precision. The XL compiler optimization portfolio includes a myriad of options for directing advanced optimization, and the transformations your application undergoes are largely under your control. The discussion of each optimization level in the Advanced optimizations table includes information on not only the performance benefits, and the possible trade-offs as well, but information on how you can help guide the optimizer to find the best solutions for your application.

Table 4. Advanced optimizations
Optimization Level Additional options implied Complementary options Options with possible benefits
-O3
  • -qnostrict
  • -qmaxmem=-1
  • -qhot=level=0
  • -qarch
  • -qtune
  • -qpdf
-O4
  • -qhot
  • -qipa
  • -qarch=auto
  • -qtune=auto
  • -qcache=auto
  • -qarch
  • -qtune
  • -qcache
  • -qpdf
  • -qsmp=auto
-O5
  • All of -O4
  • -qipa=level=2
  • -qarch
  • -qtune
  • -qcache
  • -qpdf
  • -qsmp=auto

Optimizing at level 3

Specifying -O3 initiates more intense low-level transformations that remove many of the limitations present at -O2. For instance, the optimizer no longer checks for memory limits, by defaulting to -qmaxmem=-1. Additionally, optimizations encompass larger program regions and deepen to attempt more analysis. While not all applications contain opportunities for the optimizer to provide a measurable increase in performance, most applications can benefit from this type of analysis. Some differences between -O2 and -O3 level optimization that can result in performance benefit include:

Potential trade-offs at level 3

With the in-depth analysis of -O3 comes a trade-off in terms of compilation time and memory resources. Also, since -O3 implies -qnostrict, the optimizer can alter certain floating-point semantics in your application to gain execution speed. This typically involves precision trade-offs as follows:

You can still gain most of the -O3 benefits while preserving precise floating-point semantics by specifying -qstrict. Compiling with -qstrict is necessary if you require absolute precision in floating-point computational accuracy as compared with -O0 or -O2 results. The -qstrict compiler option also ensures adherence to all IEEE semantics for floating-point operations. If your application is sensitive to floating-point exceptions or the order of evaluation for floating-point arithmetic, compiling with -qstrict will help assure accurate results. Without -qstrict, the difference in computation for any one source-level operation is very small in comparison to basic optimization. Though a small difference can compound if the operation is in a loop structure where the difference becomes additive, most applications are not sensitive to the changes that can occur in floating-point semantics.

See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.

An intermediate step: adding -qhot suboptions at level 3

At -O3, the optimization includes minimal -qhot loop transformations at level=0 to increase performance. You can further increase your performance benefit by increasing the level and therefore the aggressiveness of -qhot. Try specifying -qhot without any suboptions, or -qhot=level=1. The following -qhot suboptions can also provide additional performance benefits, depending on the characteristics of your application:

For more information on -qhot, see Benefits of high-order transformation (HOT).

Optimization at level 4

Optimizing at -O4 builds on -O3 by triggering -qipa=level=1 which performs interprocedural analysis (IPA), optimizing your entire application as a unit. This option is particularly pertinent to applications that contain a large number of frequently used routines. Some optimizations that interprocedural analysis can perform are as follows:

To make full use of IPA optimizations, you must specify -O4 on the compilation and link steps of your application build as interprocedural analysis occurs in stages at both compile and link time.

The IPA process

  1. At compilation time optimizations occur on a file-by-file basis, as well as preparation for the link stage. IPA writes analysis information directly into the object files the compiler produces.
  2. At the link stage, IPA reads the information from the object files and analyzes the entire application.
  3. This analysis guides the optimizer on how to rewrite and restructure your application and apply appropriate -O3 level optimizations.

The Benefits of interprocedural analysis (IPA) section contains more information on IPA including details on IPA suboptions.

Beyond -qipa, -O4 enables other optimization options:

Potential trade-offs at level 4

In addition to the trade-offs already mentioned for -O3, specifying -qipa can significantly increase compilation time, especially at the link step.

See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.

Optimization at level 5

As the highest base optimization level, -O5 includes all -O4 optimizations and deepens whole program analysis by increasing the -qipa level to 2. Compiling with -O5 also increases how aggressively the optimizer pursues aliasing improvements. Additionally, if your application contains a mix of XL C/C++ and Fortran code that you compile using XL compilers, you can increase performance by compiling and linking your code with the -O5 option.

Potential trade-offs at level 5

Compiling at -O5 consumes more time and machine resource than any other optimization level, particularly if you include -O5 on the IPA link step. Only compile at -O5 as the final phase in your optimization process after successfully compiling and executing your application at -O4.

See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.