Inlining is the process of replacing a subroutine or function call at the call site with the body of the subroutine or function being called. This eliminates call-linkage overhead and can expose significant optimization opportunities. For example, with inlining, the optimizer can replace the subroutine parameters in the function body with the actual arguments passed. Inlining trade-offs can include code bloat and an increase in the difficulty of debugging your source code.
If your application contains many calls to small procedures, the procedure call overhead can sometimes increase the execution time of the application considerably. Specifying the -qipa=inline compiler option can reduce this overhead. Additionally, you can use the -p or -pg options and profiling tools to determine which subprograms your application calls most frequently, and list their names using -qipa=inline to ensure inlining.
The -qipa option can perform inlining where the calling and called procedures are in different compilation units.
# Let the compiler decide (relatively cautiously) what to inline. xlf95 -O3 -qipa=inline inline.f # Encourage the compiler to inline particular subprograms. xlf95 -O3 -qipa=inline=called_100_times,called_1000_times inline.f
A common occurrence in application optimization is excessive inlining. This can actually lead to a decrease in performance because running larger programs can cause more frequent cache misses and page faults. Since the XL compilers contain safeguards to prevent excessive inlining, this can lead to situations where subprograms you want to inline are not automatically inlined when you specify -qipa=inline.
Some common conditions that prevent -qipa=inline from inlining particular subprograms are:
Consider an example with three procedures where : A is the caller, and B and C are at the upper size limit for automatic inlining. They are all in the same file, which you would compile as follows:
xlf -qipa=inline=c file.f
Specifying -qipa=inline means that calls to C are more likely to be inlined. If B and C were twice as large as the upper size limit for automatic inlining, no inlining would take place for calls to B. However inlining would still take place for some calls to C.
To change the size limits that control inlining, you can specify -qipa=limit=n, where n is 0 through 9. Larger values allow more inlining.
It is possible to inline C/C++ functions into Fortran programs and Fortran functions into C/C++ programs during link time optimizations. You must compile the C/C++ code using the IBM XL C/C++ compilers with -qipa and a compatible option set to that used in the IBM XL Fortran compilation.