After applying basic optimizations and successfully compiling and executing your application, you can apply more powerful optimization tools. Higher optimization levels can have a tremendous impact on performance, but some trade-offs can occur in terms of code size, compilation time, resources and numeric or algorithmic precision. The XL compiler optimization portfolio includes a myriad of options for directing advanced optimization, and the transformations your application undergoes are largely under your control. The discussion of each optimization level in the Advanced optimizations table includes information on not only the performance benefits, and the possible trade-offs as well, but information on how you can help guide the optimizer to find the best solutions for your application.
Optimization Level | Additional options implied | Complementary options | Options with possible benefits |
-O3 |
|
|
|
-O4 |
|
|
|
-O5 |
|
|
|
Specifying -O3 initiates more intense low-level transformations that remove many of the limitations present at -O2. For instance, the optimizer no longer checks for memory limits, by defaulting to -qmaxmem=-1. Additionally, optimizations encompass larger program regions and deepen to attempt more analysis. While not all applications contain opportunities for the optimizer to provide a measurable increase in performance, most applications can benefit from this type of analysis. Some differences between -O2 and -O3 level optimization that can result in performance benefit include:
With the in-depth analysis of -O3 comes a trade-off in terms of compilation time and memory resources. Also, since -O3 implies -qnostrict, the optimizer can alter certain floating-point semantics in your application to gain execution speed. This typically involves precision trade-offs as follows:
You can still gain most of the -O3 benefits while preserving precise floating-point semantics by specifying -qstrict. Compiling with -qstrict is necessary if you require absolute precision in floating-point computational accuracy as compared with -O0 or -O2 results. The -qstrict compiler option also ensures adherence to all IEEE semantics for floating-point operations. If your application is sensitive to floating-point exceptions or the order of evaluation for floating-point arithmetic, compiling with -qstrict will help assure accurate results. Without -qstrict, the difference in computation for any one source-level operation is very small in comparison to basic optimization. Though a small difference can compound if the operation is in a loop structure where the difference becomes additive, most applications are not sensitive to the changes that can occur in floating-point semantics.
See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.
At -O3, the optimization includes minimal -qhot loop transformations at level=0 to increase performance. You can further increase your performance benefit by increasing the level and therefore the aggressiveness of -qhot. Try specifying -qhot without any suboptions, or -qhot=level=1. The following -qhot suboptions can also provide additional performance benefits, depending on the characteristics of your application:
For more information on -qhot, see Benefits of high-order transformation (HOT).
Optimizing at -O4 builds on -O3 by triggering -qipa=level=1 which performs interprocedural analysis (IPA), optimizing your entire application as a unit. This option is particularly pertinent to applications that contain a large number of frequently used routines. Some optimizations that interprocedural analysis can perform are as follows:
To make full use of IPA optimizations, you must specify -O4 on the compilation and link steps of your application build as interprocedural analysis occurs in stages at both compile and link time.
The Benefits of interprocedural analysis (IPA) section contains more information on IPA including details on IPA suboptions.
Beyond -qipa, -O4 enables other optimization options:
Enables more aggressive HOT transformations to optimize loop constructs and Fortran array language.
Optimizes array data to run mathematical operations in parallel where applicable.
Optimizes your application to execute on a hardware architecture identical to your build machine. If the architecture of your build machine is incompatible with your application's execution environment, you must specify a different -qarch suboption after the -O4 option. This overrides -qarch=auto.
Optimizes your cache configuration for execution on specific hardware architecture. The auto suboption assumes that the cache configuration of your build machine is identical to the configuration of your execution architecture. Specifying a cache configuration can increase program performance, particularly loop operations by blocking them to process only the amount of data that can fit into the data cache.
If you will be executing your application on a different machine, specify correct cache values or use -qnocache to disable the auto suboption.
In addition to the trade-offs already mentioned for -O3, specifying -qipa can significantly increase compilation time, especially at the link step.
See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.
As the highest base optimization level, -O5 includes all -O4 optimizations and deepens whole program analysis by increasing the -qipa level to 2. Compiling with -O5 also increases how aggressively the optimizer pursues aliasing improvements. Additionally, if your application contains a mix of XL C/C++ and Fortran code that you compile using XL compilers, you can increase performance by compiling and linking your code with the -O5 option.
Compiling at -O5 consumes more time and machine resource than any other optimization level, particularly if you include -O5 on the IPA link step. Only compile at -O5 as the final phase in your optimization process after successfully compiling and executing your application at -O4.
See the -O option in the XL Fortran Compiler Reference for information on the -O level syntax.