It is not necessary to hand-optimize your code, as hand-optimizing can introduce unusual constructs that can obscure the intentions of your application from the compiler and limit optimization opportunities.
Large programs, especially those that take advantage of 64-bit capabilities, can use significant address space resources. Use 64-bit mode only if your application requires the additional address space resources it provides you with.
Avoid breaking your program into too many small functions, as this can increase the percentage of time the program spends in dealing with call overhead. If you choose to use many small functions, compiling with -qipa can help minimize the impact on performance. Attempting to optimize an application with many small functions without the benefit of -qipa can severely limit the scope of other optimizations.
Using command invocations like xlf90 and xlf95 will enhance standards conformance and code portability.
Specifying -qnosave sets the default storage class of all variables to automatic. This provides more opportunities for optimization. The xlf90, xlf95, xlf90_r, and xlf95_r command invocations use -qnosave by default.
Use modules to group related subroutines and functions.
Use module variables instead of common blocks for global storage.
Mark all code that accesses or manipulates data objects by independent I/O processes and independent, asynchronously interrupting processes as VOLATILE. For example, mark code that accesses shared variables and pointers to shared variables. Mark your code carefully however, as VOLATILE is a barrier to optimization as accessing a VOLATILE object forces the compiler to always load the value from storage. This prevents powerful optimizations such as constant propagation or invariant code motion.
The XL compilers support high performance libraries that can provide significant advantages over custom implementations or generic libraries.