Using interprocedural analysis

Interprocedural analysis (IPA) enables the compiler to optimize across different files (whole-program analysis), and can result in significant performance improvements. You can specify interprocedural analysis on the compile step only or on both compile and link steps in "whole program" mode (with the exception of the clonearch and cloneproc suboptions, which must be specified on the link step). Whole program mode expands the scope of optimization to an entire program unit, which can be an executable or shared object. As IPA can significantly increase compilation time, you should limit using IPA to the final performance tuning stage of development.

You enable IPA by specifying the -qipa option. The most commonly used suboptions and their effects are described in the following table. The full set of suboptions and syntax is described in -qipa.

Table 14. Commonly used -qipa suboptions
suboption Behavior
level=0 Program partitioning and simple interprocedural optimization, which consists of:
  • Automatic recognition of standard libraries.
  • Localization of statically bound variables and procedures.
  • Partitioning and layout of procedures according to their calling relationships. (Procedures that call each other frequently are located closer together in memory.)
  • Expansion of scope for some optimizations, notably register allocation.
level=1 Inlining and global data mapping. Specifically:
  • Procedure inlining.
  • Partitioning and layout of static data according to reference affinity. (Data that is frequently referenced together will be located closer together in memory.)
This is the default level if you do not specify any suboptions with the -qipa option.
level=2 Global alias analysis, specialization, interprocedural data flow:
  • Whole-program alias analysis. This level includes the disambiguation of pointer dereferences and indirect function calls, and the refinement of information about the side effects of a function call.
  • Intensive intraprocedural optimizations. This can take the form of value numbering, code propagation and simplification, code motion into conditions or out of loops, elimination of redundancy.
  • Interprocedural constant propagation, dead code elimination, pointer analysis, code motion across functions, and interprocedural strength reduction.
  • Procedure specialization (cloning).
  • Whole program data reorganization.
inline=variable Allows precise control over function inlining.
clonearch=arch_list Allows you to specify multiple architectures for which optimized instructions can be generated. Supported architecture values are PWR4, PWR5, and PPC970. For every function in your program, the compiler generates a generic version of the instruction set, according to the -qarch value in effect, and, if appropriate, clones specialized versions of the instruction set for the architectures you specify in this suboption. The compiler inserts code into your application to check for the processor architecture at run time, and selects the version of the generated instructions that is optimized for the runtime environment.
cloneproc=func_list Allows you to specify the exact functions which should be cloned for the specified architectures in the clonearch suboption.
fine_tuning Other values for -qipa provide the ability to specify the behavior of library code, tune program partitioning, read commands from a file, etc.