By default, the compiler performs only quick local optimizations such as
constant folding and elimination of local common sub-expressions, while still
allowing full debugging support. You can optimize your program by
specifying various optimization levels, which provide increasing application
performance, at the expense of larger program size and debugging
support. The options you can specify are summarized in the following
table, and more detailed descriptions of the techniques used at each
optimization level are provided below.
Table 8. Optimization levels
Option
| Behavior
|
-O or -O2 or
-qoptimize or -qoptimize=2
| Comprehensive low-level optimization; partial debugging
support.
|
-O3 or -qoptimize=3
| More extensive optimization; some precision trade-offs.
|
-O4 or -qoptimize=4
| Interprocedural optimization; loop optimization; automatic
machine tuning.
|
-O5 or -qoptimize=5
|
At optimization level 2, the compiler is conservative in the optimization
techniques it applies and should not affect program correctness. At
optimization level 2, the following techniques are used:
At optimization levels 3 and above, the compiler is more aggressive, making
changes to program semantics that will improve performance even if there is
some risk that these changes will produce different results. Here are
some examples:
- In some cases, X*Y*Z will be calculated as X*(Y*Z) instead of
(X*Y)*Z. This could produce a different result due to rounding.
- In some cases, the sign of a negative zero value will be lost. This
could produce a different result if you multiply the value by infinity.
Getting the most out of optimization levels 2 and 3 provides some suggestions for mitigating this risk.
At optimization level 3, all of the techniques in optimization level 2 are
used, plus the following:
- Unrolling deeper loops and improving loop scheduling.
- Increasing the scope of optimization.
- Performing optimizations with marginal or niche effectiveness, which might
not help all programs.
- Performing optimizations that are expensive in compile time or
space.
- Reordering some floating-point computations, which might produce precision
differences or affect the generation of floating-point-related exceptions
(equivalent to compiling with the -qnostrict
option).
- Eliminating implicit memory usage limits (equivalent to compiling with the
-qmaxmem=-1 option).
- Increasing automatic inlining.
- Propagating constants and values through structure
copies.
- Removing the "address taken" attribute if possible after other
optimizations.
- Grouping loads, stores and other operations on contiguous aggregate
members, in some cases using VMX vector register operations.
At optimization levels 4 and 5, all of the techniques in optimization
levels 2 and 3 are used, plus the following:
- Interprocedural analysis, which invokes the optimizer at link time to
perform optimizations across multiple source files (equivalent to compiling
with the -qipa option).
- High-order transformations, which provide optimized handling of loop nests
and array language constructs (equivalent to compiling with the -qhot option).
- Hardware-specific optimization (equivalent to compiling with the -qarch=auto, -qtune=auto, and -qcache=auto options).
- At optimization level 5, more detailed interprocedural analysis (the
equivalent to compiling with the -qipa=level=2 option). With
level 2 IPA, high-order transformations (equivalent to compiling with -qhot) are delayed until link time, after
whole-program information has been collected.
Here is a recommended approach to using optimization levels 2 and 3:
- If possible, test and debug your code without optimization before using -O2.
- Ensure that your code complies with its language standard.
-
In C code, ensure that the use of pointers follows the type
restrictions: generic pointers should be char* or
void*. Also check that all shared variables and pointers to
shared variables are marked volatile.
-
In C, use the -qlibansi compiler option unless
your program defines its own functions with the same names as library
functions.
- Compile as much of your code as possible with -O2.
- If you encounter problems with -O2, consider using -qalias=noansi rather than turning off
optimization.
- Next, use -O3 on as much code as possible.
- If you encounter problems or performance degradations, consider using -qstrict or -qcompact along with -O3 where
necessary.
- If you still have problems with -O3, switch to -O2
for a subset of files, but consider using -qmaxmem=-1, -qnostrict, or both.
