If a program has many subprogram calls, you can use the -qipa=inline option to turn on inlining, which reduces the overhead of such calls. Consider using the -p or -pg option with gprof to determine which subprograms are called most frequently and to list their names on the command line.
To make inlining apply to calls where the calling and called subprograms are in different scopes, include the -qipa option.
# Let the compiler decide (relatively cautiously) what to inline. xlf95 -O3 -qipa=inline inline.f # Encourage the compiler to inline particular subprograms. xlf95 -O3 -qipa=inline=called_100_times,called_1000_times inline.f # Explicity extend the inlining to calls across multiple files. xlf95 -O3 -qipa=inline=called_100_times,called_1000_times -qipa inline.f
Getting the right amount of inlining for a particular program may require some work on your part. The compiler has a number of safeguards and limits to avoid doing an excessive amount of inlining. Otherwise, it might perform less overall optimization because of storage constraints during compilation, or the resulting program might be much larger and run slower because of more frequent cache misses and page faults. However, these safeguards may prevent the compiler from inlining subprograms that you do want inlined. If this happens, you will need to do some analysis or rework or both to get the performance benefit.
As a general rule, consider identifying a few subprograms that are called most often, and inline only those subprograms.
Some common conditions that prevent -qipa=inline from inlining particular subprograms are:
Consider an example with three procedures: A is the caller, and B and C are at the upper size limit for automatic inlining. They are all in the same file, which is compiled like this:
xlf -qipa=inline=c file.f
The -qipa=inline means that calls to C are more likely to be inlined. If B and C were twice as large, calls to B would not be inlined at all, while some calls to C could still be inlined.
Although these limits might prevent some calls from A to B or A to C from being inlined, the process starts over after the compiler finishes processing A.
To change the size limits that control inlining, you can specify -qipa=limit=n, where n is 0 through 9. Larger values allow more inlining.
It is possible that C/C++ functions can be inlined into Fortran programs and visa-versa during link-time optimizations. The C/C++ code would have to be compiled using the IBM XL C/C++ compilers with -qipa and a compatible option set to that used in the XLF compilation.