Getting started with optimization

Simple compilation is a translation or transformation of the source code into an executable or shared object. An optimizing transformation is one that gives your application better overall performance at run time. XL C/C++ provides a portfolio of optimizing transformations tailored to the PowerPC architecture. These transformations can:

Their aim is to make your application run faster.

Significant performance improvements can be achieved with relatively little development effort if you understand the available controls that affect the transformation of well-written code. Programming models such as OpenMP allow you to write high-performance code. This section describes some of the optimizations the compiler can perform to help you balance the trade-offs among run-time performance, hand-coded micro-optimizations, general readability, and overall portability of your source code.

This discussion assumes that you have used a profiler to identify the areas in your code where optimization might be appropriate.

Optimizations are often attempted in the later phases of application development cycles, such as product release builds. If possible, you should test and debug your code without optimization before attempting to optimize it. Embarking on optimization should mean that you have chosen the most efficient algorithms for your program and that you have implemented them correctly. To a large extent, compliance with language standards is directly related to the degree to which your code can be successfully optimized. Optimizers are the ultimate conformance test!

Optimization is controlled by compiler options, directives, and pragmas. However, compiler-friendly programming idioms can be as useful to performance as any of the options or directives. It is no longer necessary nor is it recommended to excessively hand-optimize your code (for example, manually unrolling loops). Unusual constructs can confuse the compiler (and other programmers), and make your application difficult to optimize for new machines.

It should be noted that not all optimizations are beneficial for all applications. A trade-off usually has to be made between an increase in compile time, accompanied by reduced debugging capability, and the degree of optimization done by the compiler.

Related References


Selected compiler options for optimization

The following table features a selection of basic compiler options for optimizing program performance. For an exhaustive list, see XL C/C++ Programming Guide. For documentation of the available suboptions, see XL C/C++ Compiler Reference or the options man page.

Table 1. Basic compiler options for optimization

Option Description
-qnoopt The compiler performs very limited optimization. This is the default. Before you start optimizing your application, ensure that it compiles successfully with -qnoopt.
-O2 The compiler performs comprehensive low-level optimization, which includes graph coloring, common subexpression elimination, dead code elimination, algebraic simplification, constant propagation, instruction scheduling for the target machine, loop unrolling, and software pipelining.
-qarch
-qtune
-qcache
The compiler takes advantage of the characteristics of the specific hardware and instruction set where the application run. Use -qarch to specify family of processor architectures for which application code should be generated. Use -qtune to bias optimization toward execution on a given microprocessor. Use -qcache to specify a specific cache or memory geometry.
-qpdf1
-qpdf2
When specified with an optimization level of -O or higher, the compiler uses profile-directed feedback to optimize the application based on an analysis of how often different sections of code are typically executed. The PDF process is most useful for applications that contain unstructured branching.
-O3 The compiler performs more aggressive optimization than at -O2: deeper loop unrolling, better loop scheduling, elimination of the limits on implicit memory usage.
-qhot The compiler performs high-order transformations, which provide additional loop optimization and optionally performs array padding. This option is most useful for scientific applications that perform a large amount of numerical processing.
-qipa The compiler performs interprocedural analysis to optimize the entire application as a unit (whole-program analysis). This option is most useful for business applications that contain a large number of frequently used routines. It is also useful for C++ programs with a high level of abstraction. In many cases, this option significantly increases compilation time.
-O4 This is equivalent to -O3 -qipa -qhot -qarch=auto -qtune=auto -qcache=auto. If the compilation takes too long, try compiling with -O4 -qnoipa.
-O5 This is equivalent to -O4 -qipa=level=2. On the Linux platform, this option also turns on -qhot=vector -qhot=simd, provided that the processor is PowerPC 970 and that AltiVec data types are supported by the operating system.
IBM Copyright 2003