Using the Mathematical Acceleration Subsystem (MASS)

The MASS libraries consist of a library of scalar routines, described in Using the scalar library, and a set of vector libraries tuned for specific architectures, described in Using the vector libraries. The routines contained in both scalar and vector libraries are automatically called at certain levels of optimization, but you can also call them explicitly in your programs. Note that the accuracy and exception handling might not be identical in MASS routines and system library routines.

Compiling and linking a program with MASS describes how to compile and link a program that uses the MASS libraries, and how to selectively use the MASS scalar library routines in concert with the regular system library scalar routines.

Using the scalar library

The MASS scalar library, libmass.a1, contains an accelerated set of frequently used math intrinsic functions that provide improved performance over the corresponding standard system library functions. When you compile programs with any of the following options:

the compiler automatically uses the faster MASS routines for all scalar routines (with the exception of atan2, dnint, sqrt, rsqrt). (The compiler first tries to "vectorize" calls to the scalar routines by replacing them with the MASS vector routines; if the compiler cannot do so, it will use the MASS scalar routines.) When you use these options, the compiler uses versions of the MASS routines contained in the system library libxlopt.a, and you do not need to add any special calls to the MASS routines in your code, or to link to the libxlopt library.

Notes:
  1. On Linux, 32-bit and 64-bit objects cannot be combined in the same library, so two versions of the scalar library are shipped with the compiler: libmass.a for 32-bit applications, and libmass_64.a for 64-bit applications.

If you are not using any of these optimization levels, and/or want to explicitly call the MASS scalar routines, you can do so by linking the MASS scalar library libmass.a (or the 64-bit version, libmass_64.a) with your application (for instructions, see Compiling and linking a program with MASS). The MASS scalar routines all accept double-precision parameters and return a double-precision result, and are summarized in Table 9. All the MASS scalar routines except rsqrt are recognized by XL Fortran as intrinsic functions, so no explicit interface block is needed. To provide an interface block for rsqrt, include mass.include in your source file.

Table 9. MASS scalar library functions
Function Description
sqrt Returns the square root of x
rsqrt Returns the reciprocal of the square root of x
exp Returns the exponential function of x
expm1 Returns (the exponential function of x) - 1
log Returns the natural logarithm of x
log1p Returns the natural logarithm of (x + 1)
sin Returns the sine of x
cos Returns the cosine of x
tan Returns the tangent of x
atan Returns the arctangent of x
atan2 Returns the arctangent of x/y
sinh Returns the hyperbolic sine of x
cosh Returns the hyperbolic cosine of x
tanh Returns the hyperbolic tangent of x
dnint Returns the nearest integer to x (as a double)
x**y Returns x raised to the power y

The following example shows the interface declaration for the rsqrt scalar function:

      interface

      real*8 function rsqrt (%val(x))
        real*8 x      ! Returns the reciprocal of the square root of x.
      end function rsqrt

      end interface

The trigonometric functions (sin, cos, tan) return NaN (Not-a-Number) values for large arguments (abs(x)>2**50*pi).

Note:
In some cases, the MASS functions are not as accurate as those in the standard intrinsic functions and they may handle edge cases differently (sqrt(Inf), for example).

Using the vector libraries

When you compile programs with any of the following options:

the compiler automatically attempts to vectorize calls to system math routines by calling the equivalent MASS vector routines (with the exceptions of functions vatan2, vsatan2, vdnint, vdint, vsincos, vssincos, vcosisin, vscosisin, vqdrt, vsqdrt, vrqdrt, vsrqdrt, vpopcnt4, and vpopcnt8).

If you are not using any of these optimization levels, and/or want to explicitly call any of the MASS vector routines, you can do so by including massv.include in your source files to provide the interface declarations for the routines, and by linking to any of the following vector library archives (information on linking is provided in Compiling and linking a program with MASS):

libmassvp4.a
Contains routines that have been tuned for the POWER4 architecture. If you are using a PowerPC 970 machine, this library is the recommended choice.
libmassvp5.a
Contains routines that have been tuned for the POWER5 architecture.

On Linux(R), 32-bit and 64-bit objects must not be mixed in a single library, so a separate 64-bit version of each vector library is provided: libmassvp4_64.a and libmassvp5_64.a.

With the exception of a few routines (described below), all of the floating-point routines in the vector libraries accept three parameters:

These routines are all of the form:

function_name (y,x,n)

where y is the output vector, x is the source vector, and n is the vector length. The parameters y and x are assumed to be double-precision for functions whose prefix is v, and single-precision for functions with the prefix vs. As an example, the following code:

include 'massv.include'

real*8 x(500), y(500)
integer n
n = 500
...
call vexp (y, x, n)

outputs a vector y of length 500 whose elements are exp(x(i)), with i=1,...,500.

The routines vatan2, vdiv, and vpow take four parameters and are of the form routine_name(z,x,y,n). The routine vsincos takes four parameters of the form routine_name(y,z,x,n). The routine vatan2 outputs a vector z whose elements are atan(x(i)/y(i)). The routine vdiv outputs a vector z whose elements are x(i)/y(i). The routine vpow outputs a vector z whose elements are x(i)y(i). The routine vsincos outputs two vectors, y and z, whose elements are sin(x(i)) and cos(x(i)) respectively.

In vcosisin(y,x,n), x is a vector of n double elements and the routine outputs a vector y of n complex*16 elements of the form (cos(x(i)),sin(x(i))).

The single-precision and double-precision floating-point routines contained in the vector libraries are summarized in Table 10

Table 10. MASS floating-point vector library functions
Double-precision function Single-precision function Arguments Description
vacos vsacos (y,x,n) Sets y(i) to the arccosine of x(i), for i=1,..,n
vasin vsasin (y,x,n) Sets y(i) to the arcsine of x(i), for i=1,..,n
vatan2 vsatan2 (z,x,y,n) Sets z(i) to the arctangent of x(i)/y(i), for i=1,..,n
vcbrt vscbrt (y,x,n) Sets y(i) to the cube root of x(i), for i=1,..,n
vcos vscos (y,x,n) Sets y(i) to the cosine of x(i), for i=1,..,n
vcosh vscosh (y,x,n) Sets y(i) to the hyperbolic cosine of x(i), for i=1,..,n
vcosisin vscosisin (y,x,n) Sets the real part of y(i) to the cosine of x(i) and the imaginary part of y(i) to the sine of x(i), for i=1,..,n
vdint (y,x,n) Sets y(i) to the integer truncation of x(i), for i=1,..,n
vdiv vsdiv (z,x,y,n) Sets z(i) to x(i)/y(i), for i=1,..,n
vdnint (y,x,n) Sets y(i) to the nearest integer to x(i), for i=1,..,n
vexp vsexp (y,x,n) Sets y(i) to the exponential function of x(i), for i=1,..,n
vexpm1 vsexpm1 (y,x,n) Sets y(i) to (the exponential function of x(i))-1, for i=1,..,n
vlog vslog (y,x,n) Sets y(i) to the natural logarithm of x(i), for i=1,..,n
vlog10 vslog10 (y,x,n) Sets y(i) to the base-10 logarithm of x(i), for i=1,..,n
vlog1p vslog1p (y,x,n) Sets y(i) to the natural logarithm of (x(i)+1), for i=1,..,n
vpow vspow (z,x,y,n) Sets z(i) to x(i) raised to the power y(i), for i=1,..,n
vqdrt vsqdrt (y,x,n) Sets y(i) to the 4th root of x(i), for i=1,..,n
vrcbrt vsrcbrt (y,x,n) Sets y(i) to the reciprocal of the cube root of x(i), for i=1,..,n
vrec vsrec (y,x,n) Sets y(i) to the reciprocal of x(i), for i=1,..,n
vrqdrt vsrqdrt (y,x,n) Sets y(i) to the reciprocal of the 4th root of x(i), for i=1,..,n
vrsqrt vsrsqrt (y,x,n) Sets y(i) to the reciprocal of the square root of x(i), for i=1,..,n
vsin vssin (y,x,n) Sets y(i) to the sine of x(i), for i=1,..,n
vsincos vssincos (y,z,x,n) Sets y(i) to the sine of x(i) and z(i) to the cosine of x(i), for i=1,..,n
vsinh vssinh (y,x,n) Sets y(i) to the hyperbolic sine of x(i), for i=1,..,n
vsqrt vssqrt (y,x,n) Sets y(i) to the square root of x(i), for i=1,..,n
vtan vstan (y,x,n) Sets y(i) to the tangent of x(i), for i=1,..,n
vtanh vstanh (y,x,n) Sets y(i) to the hyperbolic tangent of x(i), for i=1,..,n

The integer routines are of the form function_name (x, n), where x is a vector of 4-byte (for vpopcnt4) or 8-byte (for vpopcnt8) numeric objects (integer or floating-point), and n is the vector length. The vector integer routines are summarized in Table 11.

Table 11. MASS integer vector library functions
Function Description Interface
vpopcnt4 Returns the total number of 1 bits in the concatenation of the binary representation of x(i), for i=1,...,n, where x is vector of 32-bit objects integer*4 function vpopcnt4 (x, n) integer*4 x(*), n
vpopcnt8 Returns the total number of 1 bits in the concatenation of the binary representation of x(i), for i=1,...,n, where x is vector of 64-bit objects integer*4 function vpopcnt8 (x, n) integer*8 x(*)

The following example shows interface declarations for some of the MASS double-precision vector routines:

interface

subroutine vsqrt (y, x, n)
  real*8 y(*), x(*)
  integer n        ! Sets y(i) to the square root of x(i), for i=1,..,n
end subroutine vsqrt

subroutine vrsqrt (y, x, n)
  real*8 y(*), x(*)
  integer n        ! Sets y(i) to the reciprocal of the square root of x(i),
                   ! for i=1,..,n
end subroutine vrsqrt

end interface

The following example shows interface declarations for some of the MASS single-precision vector routines:

interface

subroutine vssqrt (y, x, n)
  real*4 y(*), x(*)
  integer n       ! Sets y(i) to the square root of x(i), for i=1,..,n
end subroutine vssqrt

subroutine vsrsqrt (y, x, n)
  real*4 y(*), x(*)
  integer n       ! Sets y(i) to the reciprocal of the square root of x(i),
                   ! for i=1,..,n
end subroutine vsrsqrt

end interface

Overlap of input and output vectors

Normally, Fortran subroutine calls should pass only parameters that are disjoint, meaning that they do not overlap in memory. However, in calls to the MASS vector routines, this restriction is relaxed, and applications can use the same vector for both input and output parameters (for example, vsin (y, y, n)). Other kinds of overlap (where input and output vectors are neither disjoint nor identical) should be avoided, since they may produce unexpected results:

Consistency of MASS vector routines

All of the routines in the MASS vector libraries are consistent, in the sense that a given input value will always produce the same result, regardless of its position in the vector, and regardless of the vector length.

Compiling and linking a program with MASS

To compile an application that calls the functions in the MASS libraries, specify mass and massvp4 (or massvp5) (32-bit), or mass_64 and massvp4_64 (or massvp5_64) (64-bit) on the -l linker option. For example, if the MASS libraries are installed in the default directory, you could specify one of the following:

xlf progf.f -o progf -lmass -lmassvp4
xlf progf.f -o progf -lmass_64 -lmassvp4_64 -q64

The MASS routines must run in the round-to-nearest rounding mode and with floating-point exception trapping disabled. (These are the default compilation settings.)

Using libmass.a with the math system library

If you wish to use the libmass.a (or libmass_64.a) scalar library for some functions and the system library for other functions, follow this procedure to compile and link your program:

  1. Use the ar command to extract the object files of the desired functions from libmass.a or libmass_64.a. For most functions, the object file name is the function name followed by .s32.o (for 32-bit mode) or .s64.o (for 64-bit mode).1 For example, to extract the object file for the tan function in 32-bit mode, the command would be:
    ar -x tan.s32.o libmass.a
  2. Archive the extracted object files into another library:
     ar -qv libfasttan.a tan.s32.o 
     ranlib libfasttan.a 
  3. Create the final executable using xlf, specifying -lfasttan instead of -lmass:
    xlf sample.f -o sample dir_containing_libfasttan.a -lfasttan
    This links only the tan function from MASS (now in libfasttan.a) and the remainder of the math functions from the standard system library.
Exceptions:
  1. The sin and cos functions are both contained in each of the object files sincos.s32.o and sincos.s64.o.
  2. The ** (exponentiation) operator is contained in the object files dxy.s32.o and dxy.s64.o.
Note:
Both MASS cos and sin functions are automatically linked if you export either one.