By default, the compiler generates code that runs on all supported systems, though this code does not run optimally on all supported systems. By selecting options to target the appropriate architectures, you can optimize your application to suit the broadest possible selection of relevant processors, a range of processors within a given family, or a specific processor. The compiler options in the Options for targeting your architecture table introduce how you can control optimizations affecting individual aspects of your target architecture. This section also goes into further detail on how you can use some of those options to ensure your application provides the best possible performance on those targets.
Option | Behavior |
---|---|
-q32 | Generates code for a 32-bit addressing model (32-bit execution mode). |
-q64 | Generates code for a 64-bit addressing model (64-bit execution mode). |
-qarch | Selects a family of processor architectures, or a specific architecture that the compiler will generate machine instructions for. If you specify multiple architecture settings, only the last architecture is considered valid. |
-qtune | Focuses optimizations for execution on a given processor without restricting the processor architectures that your application can execute on. If you specify multiple architecture settings, only the last architecture is considered valid. |
-qcache | Defines a specific cache or memory geometry. Selecting a predefined optimization level like -O2 sets default vales for -qcache suboptions. |
-qipa=clonearch | Instructs the compiler to generate duplicate subprograms with each tuned to a particular architecture. |
In addition to targeting the correct architecture for your application, it is important to select the right level of optimization. Combining the appropriate architecture settings with an optimization level that fits your application can vastly enhance performance. If you have not already done so, consult Optimizing XL compiler applications in addition to this section.
Using -qarch you can select a machine architecture or a family of architectures on which you can run your application. Selecting the correct -qarch suboption is crucial to influencing chip-level optimization as the choice of -qarch suboption controls:
Architecture selection is important at all optimization levels. Even at low optimization levels like -O0 and -O2, specifying the correct target architecture can be beneficial to performance. Specifying the correct target allows the compiler to select more efficient machine instructions and generate instruction sequences that perform best for a particular machine.
The -qarch suboptions allow you to specify individual processors or a family of processors with common instruction sets or subsets. The choice of processor gives you the flexibility of compiling your application to execute optimally on a particular machine, or to execute on a wide variety of machines while still applying as much architecture-specific optimization as possible. The less specific your choice of architecture, the fewer machine instructions available to the compiler when generating code. A less specific choice can also limit the number of hardware intrinsic functions available to your application. A more specific choice of architecture, can make available more instructions and hardware intrinsic functions. The XL Fortran Compiler Reference details the specific chip architectures and architecture families available.
When compiling your application, using a consistent or compatible -qarch setting for all files will ensure that you are getting the most from your architecture targets. If you are using -qipa link-time optimizations, the architecture setting you specify on the link step overrides the compile step setting.
You must ensure that your application executes only on machines that support your -qarch settings. Executing your application on other machines can produce incorrect results, even if your application appears to run without trapping. In some cases, -qarch suboptions are both individual targets and family targets because the instruction set of newer chips is a superset of the instruction set that earlier chips support. For example, the PWR3 -qarch setting can also safely target PWR3, PWR4, and PWR5, and even PPC970 systems because those processors support the complete base PWR3 instruction set.
If your application executes on a single type of processor, use the -qarch setting matching your target processor. If your application will run on multiple processor types, choose a -qarch setting with the largest common intersection of all the processors. You can do this by examining the instruction groups available to the processors and choosing a family setting that best represents it. The following table can assist you with that choice. Note that not all XL compilers support all architectures.
Available | Instructions | |||||||
-qarch suboption | PowerPC | Graphics | Sqrt | 64-bit | PWR3 | PWR4 | PWR5 | VMX |
ppc family | X | |||||||
ppcgr family | X | X | ||||||
ppc64 family | X | X | ||||||
ppc64gr | X | X | X | |||||
ppc64grsq | X | X | X | X | ||||
rs64b | X | X | X | X | ||||
rs64c | X | X | X | X | ||||
pwr3 chip and family | X | X | X | X | X | |||
pwr4 chip and family | X | X | X | X | X | X | ||
pwr5 chip and family | X | X | X | X | X | X | X | |
pwr5x chip | X | X | X | X | X | X | X | |
ppc64v family | X | X | X | X | X | X | X | |
ppc970 chip | X | X | X | X | X | X | X | |
-qarch suboption | PowerPC | Graphics | Sqrt | 64-bit | PWR3 | PWR4 | PWR5 | VMX |
Using the default value for -qarch represents the broadest possible range of machines that the compiler supports. For example, the compiler will default to a setting of ppc64grsq. If you know that your code will only execute on Power5 machines, avoid the default -qarch setting and choose at least PWR5, instead.
If you require optimal performance on multiple differing machines running the same copy of your application, you can use -qipa=clonearch. This option instructs the compiler to generate duplicate subprograms with each tuned to a particular architecture.
Other compiler options can influence the suboption selection for -qarch. The -q64 option forces an upgrade of the -qarch suboption to the minimum chip that can support 64-bit instructions. For example, on Linux, the setting is PPC64GRSQ. The -qarch=auto suboption is selected automatically when you compile at -O4 and -O5, and assumes that your compilation machine and your target execution machine are the same. For example, if you compile on a PWR5 machine and specify -O5, the -qarch setting defaults to PWR5. You can override this behavior by specifying the -qarch option after the -O4 or -O5 compiler options.
The -qtune option focuses optimizations for execution on a given processor without restricting the processor architectures that your application can execute on, generating machine instructions consistent with your -qarch architecture choice. Using -qtune also guides the optimizer in performing transformations, such as instruction scheduling, so that the resulting code executes most efficiently on your chosen -qtune architecture. The -qtune option tunes code to run on one particular processor architecture, and includes only specific processors as suboptions. The -qtune option does not support suboptions representing families of processors.
Use -qtune to specify the most common or critical processor where your application executes. For example, if your application usually executes on POWER5-based systems, but will sometimes execute on a POWER4-based system, specify -qtune=pwr5. The compiler generates code that executes more efficiently on a POWER5-based system, but will still run correctly on a POWER4-based system.
The default -qtune setting depends on the -qarch setting. If the -qarch option is set to a particular machine architecture, this limits the range of available -qtune suboptions, and the default tune setting will be compatible with the selected target processor. If -qarch option is set to a family of processors, the range of values available for -qtune expands across that family, and the default is chosen from a commonly used machine in that family. If you compile with -qtune=auto, the default for optimization levels -O4 and -O5, the compiler detects the machine characteristics on which you are compiling, and assumes you want to tune for that type of machine. You can override this behavior by specifying -qtune after the -O4 or -O5 compiler options.
The -qcache option allows you to instruct the optimizer on the memory cache layout of your target architecture. There are several suboptions you can specify to describe cache characteristics such as:
The -qcache option is only effective if you understand the cache characteristics of the execution environment of your application. Before using -qcache, look at the options section of the listing file with the -qlist option to see if the current cache settings are acceptable. The settings appear in the listing when you compile with -qlistopt. If you are unsure about how to interpret this information, do not use -qcache, and allow the compiler to use default cache settings.
If you do not specify -qcache, the compiler makes cache assumptions based on your -qarch and -qtune settings. If you compile with the -qcache=auto suboption, the default at optimization levels -O4 and -O5, the compiler detects the cache characteristics of your compilation machine and tunes cache optimizations for that cache layout. If you do specify -qcache, also specify -qhot, or an option such as -O4 that implies -qhot. The optimizations that -qhot performs are designed to take advantage of your -qcache settings.
Consult the following list to ensure that you are getting the most out of your target machine options.