Using profile-directed feedback

You can use profile-directed feedback (PDF) to tune the performance of your application for a typical usage scenario. The compiler optimizes the application based on an analysis of how often branches are taken and blocks of code are executed. Because the process requires compiling the entire application twice, it is intended to be used after other debugging and tuning is finished, as one of the last steps before putting the application into production.

The following diagram illustrates the PDF process.

Figure 2. Profile-directed feedback

Profile-directed feedback

You first compile the program with the -qpdf1 option (with a minimum optimization level of -O), which generates profile data by using the compiled program in the same ways that users will typically use it. You then compile the program again, with the -qpdf2 option. This optimizes the program based on the profile data, by invoking qipa=level=0.

Note that you do not need to compile all of the application's code with the -qpdf1 option to benefit from the PDF process; in a large application, you might want to concentrate on those areas of the code that can benefit most from optimization.

To use the -qpdf options:

  1. Compile some or all of the source files in the application with -qpdf1 and a minimum optimization level of -O.
  2. Run the application using a typical data set or several typical data sets. It is important to use data that is representative of the data that will be used by your application in a real-world scenario. When the application exits, it writes profiling information to the PDF file in the current working directory or the directory specified by the PDFDIR environment variable.
  3. Compile the application with -qpdf2.

You can take more control of the PDF file generation, as follows:

  1. Compile some or all of the source files in the application with -qpdf1 and a minimum optimization level of -O.
  2. Run the application using a typical data set or several typical data sets. This produces a PDF file in the current directory.
  3. Change the directory specified by the PDFDIR environment variable to produce a PDF file in a different directory.
  4. Re-compile the application with -qpdf1.
  5. Repeat steps 3 and 4 as often as you want.
  6. Use the mergepdf utility to combine the PDF files into one PDF file. For example, if you produce three PDF files that represent usage patterns that will occur 53%, 32%, and 15% of the time respectively, you can use this command:
      mergepdf -r 53 path1  -r 32 path2  -r 15 path3
    
  7. Compile the application with -qpdf2.

To collect more detailed information on function call and block statistics, do the following:

  1. Compile the application with -qpdf1 -qshowpdf -O.
  2. Run the application using a typical data set or several typical data sets. The application writes more detailed profiling information in the PDF file.
  3. Use the showpdf utility to view the information in the PDF file.

To erase the information in the PDF directory, use the cleanpdf utility or the resetpdf utility.

Example of compilation with pdf and showpdf

The following example shows how you can use PDF with the showpdf utility to view the call and block statistics for a "Hello World" application.

The source for the program file hello.c is as follows:

#include <stdio.h>  
void HelloWorld()
{
printf("Hello World");
}
main()
{
HelloWorld();
return 0;
}
 
  1. Compile the source file:
    xlc -qpdf1 -qshowpdf -O hello.c
     
    
  2. Run the resulting program executable a.out.
  3. Run the showpdf utility to display the call and block counts for the executable:
    showpdf
     
    

The results will look similar to the following:

HelloWorld(4):  1 (hello.c)
 
Call Counters:
5 | 1  printf(6)
 
Call coverage = 100% ( 1/1 )
 
Block Counters:
3-5 | 1
6 |
6 | 1
 
Block coverage = 100% ( 2/2 )
  
-----------------------------------
main(5):  1 (hello.c)
 
Call Counters:
10 | 1  HelloWorld(4)
 
Call coverage = 100% ( 1/1 )
 
Block Counters:
8-11 | 1
11 |
 
Block coverage = 100% ( 1/1 )
 
Total Call coverage = 100% ( 2/2 )
Total Block coverage = 100% ( 3/3 )
 
IBM Copyright 2003